← Back to context

Comment by dagss

1 year ago

I'm not arguing that event sourcing should be done for its own sake, so I don't really want to disagree with you; but that said your post doesn't perfectly resonate with me either.

When you write a typical backend system, the desired function of the system is to interact with the external world. Without I/O the system may as well not exist.

Input is a desire from someone that something be done or recording that something happened. Such input changes the data recorded, or appends what the system should know / has seen. This is an "event".

All input can be framed as being an event. But "an element's mass was modified" is not an event ... it doesn't describe someone or something giving input to our system.

The algebraic view on things you take seems to be treating the system at a different level than what I think about as event sourcing.

Neither "an element's mass was modified" or "sell" or "transaction" that you mention are realistic events. An event is "User U clicked a button in the web console to Sell share S at time T". Implementing the effects of that event -- computing a specific read model resulting from appending the event to the stored set of events -- may well be best done by some algebra like you suggest, but that seems like another topic.

You seem to talk about models for computing and transforming state. I talk about I/O vs data storage.

> All input can be framed as being an event.

Sure, it could be, but is it useful to do that? If I stand up and shout "The price of a Banana is $4 per bushel!", you could record my voice and upload it as a raw wave file. That's the rawest "input event" you can come up with. Or you could write down "some random dude said that bananas cost $4 around 4:30 pm and I'm not sure whether I believe him or not". That's not the "raw input", it's been transcribed and modified and annotated. Yet it's almost certainly more useful to your system, and it's kinda like event sourcing. Kinda.

The problem with worrying about whether something is "input" or "output" or "internal" is that you can just move the dotted line anywhere around your system to change those. If you break a monolith into independent reusable building blocks, those building blocks are going to have a completely different idea of what counts as input and output. But who cares? You're not changing any fundamental truth about how the domain works. Your domain model should really be independent of worrying about what's "input" and "output". Those lines move all the time. Instead think about what operations make sense to do with your data, and then think about the mathematical properties of those operations.

> But "an element's mass was modified" is not an event ... it doesn't describe someone or something giving input to our system.

Sure it does. Someone gave you the input that a particular element has a particular mass. How is that not input? How else did you get that data?

> The algebraic view on things you take seems to be treating the system at a different level than what I think about as event sourcing.

This is my exact point. You should think about event sourcing this way, because that's the only reason it's useful: it's accidentally a source of important "domain algebra" that you otherwise might miss. But there's lots of other important "domain algebra" that you are still missing, and they don't necessarily look like event sourcing.

> An event is "User U clicked a button in the web console to Sell share S at time T".

But surely that's not what you're storing in your system! That would be an extreme coupling between the concept of "selling shares" and "clicking a button". Those are completely unrelated ideas! Why would you want to tightly couple them!? If that's what you think event sourcing is, sorry to be blunt, but you have very badly misunderstood it.

  • Fair points. But how do I get from a "domain algebra" to practical implementation with a popular database?

    Event sourcing can be translated to adding rows to tables, or adding documents to collections. Focusing so much on "append" isn't only because of what kind of events you would model, but because you store data in databases by, well, storing it..appending it..

    If event sourcing is only useful as a source of an important "domain algebra": How does a domain algebra translate to practical use of a database system for your application? How can I focus on the "operations I want to do on my data", when the tools I am given is pretty much INSERT/UPDATE/SELECT, or GET/PUT, or some variety on these lines?

    • Bear with me, we're going on a bit of a walk.

      An abstraction layer says: here are some operations you can do, operating on some models. Presumably those models are a useful or convenient perspective on some underlying data, and those operations are important or useful as well. At this level, we can define something like "order a pizza", "renew my driver's license", or "show me a book I might like to read". Note the abstraction is driven by the needs of the users/consumers/callers of the abstraction, not its implementations. This sounds obvious, but it's extremely often done the wrong way around -- "Foo : IFoo" shouldn't be muscle memory, and it should mean "I can't currently think of any other implementation of IFoo right now, although that might change", and NOT mean "this interface is the header of this class, which we have to do or we get yelled at".

      The implementation of an abstraction layer breaks down an operation in one domain into operations in another (presumably "lower level") domain. It's possible that "renew my driver's license" can be reasonably written directly in a few SQL statements. Okay, that's no big deal. Most likely you have a few intermediate layers, or an ORM, or whatever. So the implementation of the "DMV API" (or whatever is defining that operation) is where you break it down into INSERT/UPDATE/SELECT or whatever other lower-level tools you have.

      None of this changes when you start thinking about your higher-level operations algebraically. All that really means is that you consider the operations and their inputs/outputs, and you start asking questions like: are these idempotent? Associative? Commutative? Do they have identity? You can ask those questions at the abstraction level. They are part of the definition of the operation. It's up to the implementation to make sure those properties are respected.

      Look for ways to change the operations you offer to be as "mathematical" as you can. The more mathematical they are, they more you'll serendipitously come up with new and interesting and useful ways to use them; the more reusable they'll be.

      "Renew my driver's license" is idempotent; "order a pizza" is not. Prefer idempotency. Can we make "order a pizza" idempotent? Sure. The client generates (or requests) a unique ID for the order. We have "update an order", which may actually be a family of operations. We have "finalize an order" which places it. Finalizing the same order twice does nothing -- it's idempotent.

      User story: a bunch of guys are sitting around and want to order from your pizza place, but it's always annoying to pass the phone around; everyone wants to look at the menu at once. We want "distributed ordering", so a party can all contribute what they want to the same order. They want to see what everyone else is doing in real-time. I want breadsticks, but there's only 5 of us. If anyone has ordered breadsticks, we're all good. If we have an "add breadsticks" operation and poor concurrency, we end up with 5 breadsticks in our cart; someone has to notice that and fix it. Not great.

      So what's a better operation than "add breadsticks"? How about "make sure there are enough breadsticks"? If three people all say "make sure there are enough breadsticks!" that's basically the same as one person saying it. That's an important domain operation. If all you're doing is thinking "event sourcing", then "add breadsticks" looks more like a domain event than "make sure there are enough breadsticks", but the latter is actually easier to work with and leads to a better experience. You can't make the jump from "add breadsticks" to "make sure there are enough breadsticks" just by thinking about event sourcing -- you get there by thinking about math.

      What happens when someone removes a pizza from the cart at the same time as someone else is adding pepperoni to it? This is the "Google Docs" problem. We need to think about associativity and commutativity to really solve these problems, not to mention random vs. sequential identity. Can you undo these operations? If I say "extra sauce", and someone else says "no, light sauce", we might have a conflict to resolve. But if I then undo my "set extra sauce" action -- what happens? It should clearly result in "light sauce". That's the obviously correct answer, so we can work back from that to figure out how the "set sauce amount" operation should work. "User A requests heavy sauce on pizza AF73" is idempotent and reversible. "User B requests light sauce on pizza AF73". "User A revokes their request for heavy sauce on pizza AF73". Okay, great, we're golden. "User C removes pizza AF73 from their order." "User D requests pepperoni on pizza AF73". "User C undoes their operation to remove pizza AF73 from their order." Great: pizza AF73 has pepperoni on it, despite the fact that it had been removed when pepperoni was added. No problem here.

      How do you handle statements like "User A revokes their request for heavy sauce on pizza AF73?" Event sourcing would say that this is a distinct event that you have to INSERT in your append-only log. But why not just DELETE the initial request? That works just fine. In any case, the low-level steps are the easy part.

      We're not too far from event sourcing here, but that emerged from analyzing the problem and trying to make it as "mathematically pure" as we could. We didn't start with event sourcing, and we saw another example (add breadsticks) where the "event sourced" version was worse. Event sourcing is a trick, but not the goal. The goal is "domain algebra".

      On the back end, you'll be doing reporting on these numbers. You want a really flexible reporting system? There was a buzzword a few years ago, "data cube". It was a buzzword because nobody could define what it meant, but after thinking about it for a while, I decided it could have a useful definition: A data cube is a pairing of a set of categorical data (pizza sizes, customer types, order times, whatever) and value data (costs, amounts, prep times, ratings). Each "value data" must have a monoid defined over it, which means some binary "add" (or "combine") method with identity. Any time you have a monoid defined, you get a (distributed!) "aggregate" method for free. That's what "roll up" and "drill down" are, in report-speak: projecting your categorical data and aggregating all your value data using the defined monoid. The same kind of analysis applies here, even though event sourcing has nothing to do with any of this.

      By the way: the identity of "combine" over rating and prep time are not simply 0! Think about it harder than that. How do you combine user ratings? Most likely your value type will need to be able to sensibly represent "0/0" -- that's a valid and useful value sometimes!

      We didn't get particularly "mathy" operations with our pizza example, partly because it's a very "end user" domain and not a reusable mid-level domain like unit conversions or a physics engine. It's even more important for those domains. What's a Point minus another Point? Not a Point! Are "offset" and "rotate" associative? Commutative? Distributive? Think about these things if you want a reusable physics engine.

      This is not scripting. This is not even programming. It's software engineering.

  • > Sure, it could be, but is it useful to do that? If I stand up and shout "The price of a Banana is $4 per bushel!", you could record my voice and upload it as a raw wave file. That's the rawest "input event" you can come up with. Or you could write down "some random dude said that bananas cost $4 around 4:30 pm and I'm not sure whether I believe him or not". That's not the "raw input", it's been transcribed and modified and annotated. Yet it's almost certainly more useful to your system, and it's kinda like event sourcing. Kinda.

    Huh? Obviously you (ideally) keep the raw wave file, and the transcribed and annotated version is a downstream transformation of that that you'd use for most purposes, but you can always go back to the original raw data if you need to (e.g. if your transcriber turned out to be unreliable). That's much of the point of event sourcing.

    > The problem with worrying about whether something is "input" or "output" or "internal" is that you can just move the dotted line anywhere around your system to change those. If you break a monolith into independent reusable building blocks, those building blocks are going to have a completely different idea of what counts as input and output. But who cares? You're not changing any fundamental truth about how the domain works. Your domain model should really be independent of worrying about what's "input" and "output". Those lines move all the time.

    Not my experience at all. You can move your internal lines around all you like, but the data flow of the domain will not change. You take A and B in from outside, and ultimately you use them to compute C and send that back to outside; that changes rarely if at all. And if D is your normalised form of B, you'll always compute that from B, and which component you do it in might change but the B -> D -> C flow won't.

    > But surely that's not what you're storing in your system! That would be an extreme coupling between the concept of "selling shares" and "clicking a button". Those are completely unrelated ideas! Why would you want to tightly couple them!?

    On the contrary, they're tightly coupled in the user's understanding (which is the model that ultimately matters), they should be tightly coupled in the domain model. The code should reflect the domain.