Event Sourcing On a Complexity Budget

If you've ever heard Greg Young talk about Event Sourcing (ES), or met one of its enthusiasts, you've probably never gone from "Huh?!" to "Holy S#%T!" faster in your life. And then promptly had an overwhelming sense of dread at the concept of actually implementing an Event Sourced system.

It sounds great! It is great. But how are you supposed to get a team of engineers onboard with this idea? How do you get everyone up to speed? How do you balance that with delivering value to the business fast enough that they don't panic? How are you supposed to implement it without causing a catastrophic business failure? Everyone on the team knows how CRUD works. You've been reading about ES for a week/month/aeon with diminishing returns of increased confidence.

Enter the complexity budget.

Put simply, a complexity budget is constraining yourself to a limited amount of complexity, by assigning various elements of your problem and solution a comparative value of complexity. Then, you budget how much complexity you're willing to introduce to the team over a given period, such as quarterly.

Let's say we wanted to introduce Event Sourcing using Greg Young's EventStoreDB, and build read models or projections in Postgres. That's going to require:

Deploying and managing our own database instance (No managed database service)
Learning a new API to interact with that database
Learning Event Sourcing
Learning the ES database model to build and manage event stores
Learning to write ES code
Figuring out how to deploy and manage read model populators
Figuring out how to implement observability and monitoring of read model populators
Figuring out how to deploy new versions of read models and their populators alongside new ones
Figuring out how to version events

That's a lot for a team to learn, especially if you don't already have an event sourcing expert.

If I had to get a team to grok all of that before trying a new approach, while delivering business value, it would take months even with a well developed L&D program, if not longer.

But what if we broke it down? Can we apply some lean principles to this and create some achievable learning loops?

Let's say our team has a learning velocity of 15 complexity points per quarter. This is "stick a finger in the air and guess" stuff, since we don't measure learning velocity like we measure sprint velocity. You need to rely on an intuitive sense of what's achievable for your team. I like 15 points because if 13 points is one really complex topic, it means we can teach our team one very complex thing each quarter, or a handful of small and medium complexity topics.

We want the 15 points to signify the most our team can learn to use at a production grade level each month. Remember, our craft is learning. We have incoming learnings from product, from new technologies, from new approaches, from keeping up with our colleagues. Every engineer has a learning velocity. Your team's learning velocity is that of the engineer with the lowest learning velocity. If it isn't, you're not a team.

How do we assign complexity values to the list above? Let's start by setting some examples with arbitrary relative values using reference points many people could be familiar with.

Learning SASS: 1 complexity point
Learning Reactive Extensions for Redux: 8 complexity points

Now we'll assign relative values to the tasks above:

Task	Complexity Score
Deploying and managing our own database instance (No managed database service)	8
Learning a new API to interact with that database	3
Learning Event Sourcing concepts	8
Learning the ES database model to build and manage event stores	3
Learning to write ES code	3
Figuring out how to deploy and manage read model populators	8
Figuring out how to implement observability and monitoring of read model populators	3
Figuring out how to deploy new versions of read models and their populators alongside new ones	3
Figuring out how to version events	2

That's a total of 41 complexity points. How can we fit this into a 15 point budget?

Well what if instead of deploying EventStoreDB, we use an append only Postgres table? That takes 14 points off the total.

Another 14 points of complexity is in regards to read model populators. There's a way to remove that complexity too, which I will explain shortly.

That just leaves us with:

Task	Complexity Score
Learning Event Sourcing concepts	8
Learning to write ES code	3
Figuring out how to version events	2

These are the foundational components of event sourcing and a total of 13 points!

How do we remove read model populators from the equation? There's actually a perfectly good pattern for building state derived from a series of events that many developers are familiar with: Redux.

Stay with me for a minute, this is all it requires:

An event store in an append only database table
Postgres notification topics and subscriptions
Replaying events on deployment to rebuild state
Replace Redux style actions with Event Sourcing events

In exchange you get:

The ability to completely rebuild your data model on deploy as business requirements change
No need for database migrations
Happy data analysts as your events are now the record of truth about significant business events
Happy data analysts because software engineers are now considering event data upfront as it is operationally significant and required to ship features
You keep writing and deploying applications largely as usual, High Availability, scaling, observability, all operate as usual

So what about the complexities of a system like the one above? What about snapshots? Ensuring correct ordering of events? Setting up the database table, saving events, and just generally hooking up a Redux implementation to a web server and streaming events?

Well luckily most of that has been encapsulated in a package I've created called es-reduxed.

It ensures that:

Events are dispatched once from a single instance per request
Events are received by all application instances
Events are processed in order
Events are replayed on deployment
The application does not start serving web requests until it has caught up

All consumers of the package need to do is:

Write reducers
Ensure GET requests read from the Redux state
Ensure POST requests dispatch events

However if you want to maintain a RESTful API that returns the updated resource, we've even included the ability for the raiseEvent function to asynchronously wait until the new event has been replayed through the reducer, and return the updated state via a Promise!

I've built an example application using this approach, you can:

Play with it on this demo app
Check out the code here on GitHub

In the long term

This isn't a permanent solution for an event sourcing system. It's a stepping stone to reduce the pain of getting started and build production-level experience on your team.

Those 14 complexity points for read model populators? Come back to it after you've got the system running. Once you can see what the usage patterns are, you can start thinking about what projections could be built as database tables in the 3rd normal form by read model populators.

That might even fit nicely within your complexity budget for the next quarter.

Caveats

The first caveat I have is that I'm yet to take this approach on a very large production application. However, that's also not the ideal use case. The best use case for this approach is a green fields system -- something where you can afford to make some mistakes, in production, while the team learns. As long as the events in your event store have all the data you need, there's nothing tying you to this approach in the long or even medium term. That said, my team is currently trying this approach in a production system at the moment. My experience so far has been positive! Even with my attempts to break it.

Also, this approach clearly won't work in some scenarios. If your application state is too large to fit in memory, you will need to persist some or all of the projection to a database or cache. That might take Redux off the table and reintroduces lots of complexity.

If you can slice a small domain and build confidence that way, maybe you can use it in a limited scope. Just remember that you probably want to start with a single event stream for a domain, and an event stream aligned with your transactional boundary. Otherwise you're going to introduce distributed transactions, compensating events, and complexity will quickly snowball again.

Conclusion

So do I recommend building a production system this way tomorrow? No. Build a couple of practice applications first. Get a feel for it.

Let me know how you went in the comments!

Photo by Fiona Art from Pexels

Event Sourcing On a Complexity Budget

In the long term

Caveats

Conclusion

Comments

More from this blog

I Will Never Use AI to Code (or write)

Coupling

Reliable HTTP: Outsmarting the Two Generals with Webhooks

The Fundamental Problems of Software

The Four Quadrants of Complexity

Command Palette

In the long term

Caveats

Conclusion

Comments

More from this blog