Event Sourcing On a Complexity Budget

Event Sourcing On a Complexity Budget

If you've ever heard Greg Young talk about Event Sourcing (ES), or met one of its enthusiasts, you've probably never gone from "Huh?!" to "Holy S#%T!" faster in your life. And then promptly had an overwhelming sense of dread at the concept of actually implementing an Event Sourced system.

It sounds great! It is great. But how are you supposed to get a team of engineers onboard with this idea? How do you get everyone up to speed? How do you balance that with delivering value to the business fast enough that they don't panic? How are you supposed to implement it without causing a catastrophic business failure? Everyone on the team knows how CRUD works. You've been reading about ES for a week/month/aeon with diminishing returns of increased confidence.

Enter the complexity budget.

Put simply, a complexity budget is constraining yourself to a limited amount of complexity, by assigning various elements of your problem and solution a comparative value of complexity. Then, you budget how much complexity you're willing to introduce to the team over a given period, such as quarterly.

Let's say we wanted to introduce Event Sourcing using Greg Young's EventStoreDB, and build read models or projections in Postgres. That's going to require:

  • Deploying and managing our own database instance (No managed database service)
  • Learning a new API to interact with that database
  • Learning Event Sourcing
  • Learning the ES database model to build and manage event stores
  • Learning to write ES code
  • Figuring out how to deploy and manage read model populators
  • Figuring out how to implement observability and monitoring of read model populators
  • Figuring out how to deploy new versions of read models and their populators alongside new ones
  • Figuring out how to version events

That's a lot for a team to learn, especially if you don't already have an event sourcing expert.

If I had to get a team to grok all of that before trying a new approach, while delivering business value, it would take months even with a well developed L&D program, if not longer.

But what if we broke it down? Can we apply some lean principles to this and create some achievable learning loops?

Let's say our team has a learning velocity of 15 complexity points per quarter. This is "stick a finger in the air and guess" stuff, since we don't measure learning velocity like we measure sprint velocity. You need to rely on an intuitive sense of what's achievable for your team. I like 15 points because if 13 points is one really complex topic, it means we can teach our team one very complex thing each quarter, or a handful of small and medium complexity topics.

We want the 15 points to signify the most our team can learn to use at a production grade level each month. Remember, our craft is learning. We have incoming learnings from product, from new technologies, from new approaches, from keeping up with our colleagues. Every engineer has a learning velocity. Your team's learning velocity is that of the engineer with the lowest learning velocity. If it isn't, you're not a team.

How do we assign complexity values to the list above? Let's start by setting some examples with arbitrary relative values using reference points many people could be familiar with.

  • Learning SASS: 1 complexity point
  • Learning Reactive Extensions for Redux: 8 complexity points

Now we'll assign relative values to the tasks above:

TaskComplexity Score
Deploying and managing our own database instance (No managed database service)8
Learning a new API to interact with that database3
Learning Event Sourcing concepts8
Learning the ES database model to build and manage event stores3
Learning to write ES code3
Figuring out how to deploy and manage read model populators8
Figuring out how to implement observability and monitoring of read model populators3
Figuring out how to deploy new versions of read models and their populators alongside new ones3
Figuring out how to version events2

That's a total of 41 complexity points. How can we fit this into a 15 point budget?

Well what if instead of deploying EventStoreDB, we use an append only Postgres table? That takes 14 points off the total.

Another 14 points of complexity is in regards to read model populators. There's a way to remove that complexity too, which I will explain shortly.

That just leaves us with:

TaskComplexity Score
Learning Event Sourcing concepts8
Learning to write ES code3
Figuring out how to version events2

These are the foundational components of event sourcing and a total of 13 points!

How do we remove read model populators from the equation? There's actually a perfectly good pattern for building state derived from a series of events that many developers are familiar with: Redux.

Stay with me for a minute, this is all it requires:

  • An event store in an append only database table
  • Postgres notification topics and subscriptions
  • Replaying events on deployment to rebuild state
  • Replace Redux style actions with Event Sourcing events

In exchange you get:

  • The ability to completely rebuild your data model on deploy as business requirements change
  • No need for database migrations
  • Happy data analysts as your events are now the record of truth about significant business events
  • Happy data analysts because software engineers are now considering event data upfront as it is operationally significant and required to ship features
  • You keep writing and deploying applications largely as usual, High Availability, scaling, observability, all operate as usual

So what about the complexities of a system like the one above? What about snapshots? Ensuring correct ordering of events? Setting up the database table, saving events, and just generally hooking up a Redux implementation to a web server and streaming events?

Well luckily most of that has been encapsulated in a package I've created called es-reduxed.

It ensures that:

  • Events are dispatched once from a single instance per request
  • Events are received by all application instances
  • Events are processed in order
  • Events are replayed on deployment
  • The application does not start serving web requests until it has caught up

All consumers of the package need to do is:

  • Write reducers
  • Ensure GET requests read from the Redux state
  • Ensure POST requests dispatch events

However if you want to maintain a RESTful API that returns the updated resource, we've even included the ability for the raiseEvent function to asynchronously wait until the new event has been replayed through the reducer, and return the updated state via a Promise!

I've built an example application using this approach, you can:

In the long term

This isn't a permanent solution for an event sourcing system. It's a stepping stone to reduce the pain of getting started and build production-level experience on your team.

Those 14 complexity points for read model populators? Come back to it after you've got the system running. Once you can see what the usage patterns are, you can start thinking about what projections could be built as database tables in the 3rd normal form by read model populators.

That might even fit nicely within your complexity budget for the next quarter.

Caveats

The first caveat I have is that I'm yet to take this approach on a very large production application. However, that's also not the ideal use case. The best use case for this approach is a green fields system -- something where you can afford to make some mistakes, in production, while the team learns. As long as the events in your event store have all the data you need, there's nothing tying you to this approach in the long or even medium term. That said, my team is currently trying this approach in a production system at the moment. My experience so far has been positive! Even with my attempts to break it.

Also, this approach clearly won't work in some scenarios. If your application state is too large to fit in memory, you will need to persist some or all of the projection to a database or cache. That might take Redux off the table and reintroduces lots of complexity.

If you can slice a small domain and build confidence that way, maybe you can use it in a limited scope. Just remember that you probably want to start with a single event stream for a domain, and an event stream aligned with your transactional boundary. Otherwise you're going to introduce distributed transactions, compensating events, and complexity will quickly snowball again.

Conclusion

So do I recommend building a production system this way tomorrow? No. Build a couple of practice applications first. Get a feel for it.

Let me know how you went in the comments!

Photo by Fiona Art from Pexels