Personal Finance Automation

Back when life was simpler, I didn't worry much about tracking my personal finances. I was single and only had a checking account, and never worried too much about overspending. I know how to track my finances, having learned it from my mom and taken some accounting classes in college. Once upon a time, I used GnuCash, but I got out of the habit and never really had much motivation to keep going with it.

Life doesn't stay simple, though, and eventually I realized I needed to be paying much closer attention to my income versus groceries, bills, subscriptions, mortgage, and especially restaurants. When I decided it was time to get serious again, I wanted to see if there was a web-based front-end for GnuCash this time, so I could use my laptop, phone, or any of the plethora of desktop machines around the house, rather than being tethered to my main box in the office. Unfortunately, there really isn't, but while looking, I came across Firefly III. It's not as fully-featured as GnuCash, but I decided it would be good enough for my needs and deployed it in my Kubernetes cluster. It has its warts, like not being a true double-entry accounting system, and having one of the worst container images imaginable. It's search feature is usable, albeit rather slow, and its reports are really nice. What I really enjoy, though, is its automation and integration capabilities. It's got an extremely flexible rule system, and I've written hundreds of rules to automatically classify my transactions. Beyond that, it has a decent HTTP/JSON API, and supports webhooks, so wiring it up to other tools is pretty simple.

Transaction Automation

The key to personal finance management is ensuring that every single transaction is recorded quickly and accurately. Unfortunately, none of the digital banking information providers work with my US banks, so manually entering transactions was effectively the only option.

When I first started using Firefly III, I used its Data Importer to manually upload the CSV exports from my banks' websites every so often. This was slow and cumbersome, and I started to get behind, meaning there were more transactions to import each time so it was even more cumbersome. I wanted to stick with it, so I decided to see about automating the process of downloading the CSV files and sending them to the Data Importer.

First Attempt: xactfetch

My first attempt to automate the process of importing CSV files from the bank sites was a Python script that ran daily on my desktop. It used Playwright to interact with the websites, logging in with my credentials saved in Bitwarden, clicking through the menus to get to the export page, populating the appropriate date range, and downloading the files.

As with most screen scraping projects, this one worked fine at the beginning, but I started to have to spend a lot of time, sometimes a couple of hours every few weeks, dealing with small changes to the sites' layouts or behaviors. Chase was especially difficult, since their site changes constantly, often in ways that are not visible but dramatically affect how the script works.

I did end up moving the script to run in Kubernetes as a CronJob, adding some retry logic, and splitting it to handle each bank independently instead of all in one big run. That helped a bit, but eventually, it became clear that this approach was not going to work in the long term. As the automation started to break more frequently, I became less motivated to fix it, so I was getting back into the slump of going for weeks without recording any transactions.

Second Attempt: xactmon

Back to the drawing board, I tried to find what other options were available. I found the Plaid Transactions API, which supposedly is free for hobbyists, but I didn't want to be dependent on a cloud service. At some point, I realized, (almost) all of my banks actually already push transaction data to me! They send me emails whenever I make a charge or receive a deposit, and for the most part, those emails contain everything I need to enter into Firefly. So really, what I need is a tool to parse those emails and convert them into transactions.

xactmon is split into three distinct components:

  • The receiver, which listens for notifications from the banks
  • The processor, which parses the raw bank messages and extracts the transaction details
  • The importer, which inserts the transactions

The components communicate with one another via message queues using AMQP. This architecture affords adding more implementations of the receiver and importer without changing anything else, and adds a degree of robustness by allowing individual components to fail without losing data, since unprocessed messages will remain on the queue.

BanksFastMailReceiverProcessorImporterRabbitMQFirefly IIISMTPJMAPAMQPAMQPAMQPAMQPHTTP

xactmon Receiver: JMAP Client

The entry point into xactmon for transaction notifications is the receiver. I originally imagined I may need multiple receiver implementations, perhaps for e-mail and SMS, but for now, e-mail has been sufficient.

Since I use Fastmail, I implemented the receiver as a JMAP client. If I ever have to change e-mail providers to one that doesn't support JMAP, I would need an IMAP client implementation instead.

xactmon is written in Rust, and there is a jmap-client crate available that does the heavy lifting. Unfortunately, its maintenance is "paused," and it has a bug that prevents it from working with Fastmail. The fix was simple, and Cargo makes it easy to use a fork of a crate, so I'm carrying the patch myself for now.

Using Fastmail server-side filtering, I set up rules to apply a specific Transactions label to the messages from the banks that contain transaction notifications. The xactmon receiver will only process messages with this label. Unfortunately, the JMAP protocol does not have a way to subscribe only to events pertaining to a certain folder/label, so the receiver gets a notification for every mailbox change, and then has to check to see if the change affected any messages with this label. This means it wakes up only to find nothing to do quite frequently.

When the receiver does find a new message, it fetches its contents and some metadata like sender and date, then publishes a message to an AMQP exchange to be processed. If the message is published successfully, the receiver will mark the e-mail message as "read," so it will not be processed again. If publishing to the exchange fails for some reason, the e-mail will not be marked as read, so it can attempt to handle it again later.

xactmon Processor

Once a transaction notification has been received and processed by the receiver, it is placed onto a queue for the processor to handle. At this stage, the raw content of the message is parsed, extracting the transaction details such as date, amount, and description. The most important bit is the amount, of course, but some banks provide more details than others.

The first step the processor takes is converting HTML messages to plain text, removing all markup and styling. Then, is passes the sanitized message through a list of rules. Each rule specifies a match criterion, which is a regular expression matched against the notification sender or the message contents. If the rule matches, the text is then passed to the parser defined by the rule. There are currently two parser implementations: a simple regex parser that expects a regular expression with named capture groups, and an awk parser that executes an AWK program. Messages with a simple structure can generally use the regex parser, but messages with complex multi-line information may need to use the awk parser. In either case, the parser must extract at least the amount from the message, as well as the account, date, and description, if available. If the notification does not include a date, the date the notification was received will be used. If it does not include a description (or vendor name), then a placeholder value provided by the rule definition will be used. A static account name attached to the rule will be used if the notification did not include one.

If the message can be parsed successfully and all of the fields extracted, a new structured message will be published to another AMQP exchange. If not, the original AMQP delivery will be rejected. Eventually, I would like to set up a dead-letter exchange and a GUI for inspecting the notifications that could not be processed, but for now, the rejected messages will just go nowhere.

Receive delivery from queueConvert HTML message to plain textWhich rule parser?Parse message with regular expressionPass message to awk programMessage parsed successfully?yesnoPublish structured dataRequeue deliverynosuccess?yesReject deliveryRule matches sender or bodynoFor each ruleMore rules?yesReject delivery#darkgrayregexawkno

xactmon Importer: Firefly III

The final step in the process is to take the structured data the processor extracted from the notification message and create a Firefly transaction from them. All of the required information, date, amount, account, and description, are available from the message received from the processor via the message queue. These are passed to the Firefly HTTP API.

Since the account name included in the bank's notification may not match the name in Firefly III, the importer will first search for an account whose name, account number, or notes matches the value from the notification.

If there is a problem importing the transaction into Firefly III, the message will be returned to the queue so it can be tried again later. As a special case, however, the importer can specifically handle the error returned by Firefly III when a transaction is automatically deleted by an automation rule. This feature is important because some notifications can be delivered twice, for example once when the transaction is authorized and again when it clears.

Caveats and Limitations

Overall, this system works extremely well. It's less fragile than the old screen-scraping approach, and works in near real-time, at least for some banks. Unfortunately, there are a couple of classes of transaction for which it is not sufficient. Most notably are gas station purchases; my credit card bank does not send transaction notifications by email for these at all. I do get an SMS notification, but it comes before the purchase is complete, so it does not include the amount, only the vendor and date. Another problematic type of transaction is the traditional table service restaurant. The transaction notification comes from the bank as soon as the credit card is processed, so the amount is the total before gratuity. There is no follow-up notification with the final amount, either.

For both of these types of transactions, I have to be a bit old fashioned and keep the receipt. For gas stations, I have to enter the transaction manually, and for restaurants, I only need to fix the amount.

There's another type of transaction that's a bit annoying: Wal-Mart curbside pickup orders. Wal-Mart places a hold for some arbitrary amount above the order total, which comes through as a regular transaction notification. Later, when the order is picked up, they replace the hold with another one that's closer to the total on the receipt, triggering another notification. Eventually, after a few days, the actual charge clears, usually for some amount within 1¢ and $3 of the previous total, but does not trigger a notification. There's practically no way to deal with this mess, as nothing from Wal-Mart ever shows the value they finally charge. The best I can do with these is fix them manually when the bank statement comes in at the end of the month.

Lessons

Obviously, the set of potential users of xactmon is quite limited. I imagine the intersection of people who want to track their personal finances and those who can run RabbitMQ and a bunch of Rust programs is pretty small. If I were to try to make this a more generally-approachable system, I would probably try to combine all of the roles into a single process, and eliminate the message broker entirely. I originally wanted the message broker to help avoid data loss, keeping messages around until they can be processed end-to-end successfully. Since I ended up only using e-mail as a source, this turned out to be a non-issue. I had envisioned needing a webhook-type source as well, to handle SMS notifications, etc., which would have needed more robustness in the application.

Although this system helps me keep my records really close to what the bank shows most of the time, there's really no way to be exact. There are plenty of cases where the transaction notifications are wrong, incomplete, or simply missing. For these, I still have to be dilligent in keeping track of receipts. Reconciling statements each month is still absolutely required as well.