-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TTL support for streams #23
Comments
I think this would be good to have in. Just want to check a few things
Happy to accept a PR if you have something ready? |
I don't have a PR, I thought it best to discuss an approach first (and requirements - there are useful questions here). (and my better half just got back home, so I don't have a starter-for-ten idea!) |
Sounds like the immediate requirement then is support for a hard delete. To be fair a soft delete could be represented as an event anyway. As we don't currently have the concept of a stream specific metadata document that could contain info about whether the stream as a whole is deleted, this is likely the simplest way to handle it if anyone wants that. Compression would be a good cost saving measure as it should help to reduce the RUs required simply due to needing a set amount per partition. |
Here's a challenge: we're doing this to try to fit micro-services into one DocumentDB partition each (10GB storage cap). We're doing that to reduce the RU/s cost below the 2500 minimum for scalable storage capacity. If we add a streams collection to help with the TTL concept then we have an RU overhead. It would be small though, so should still save RUs overall. I was trying to look into whether a cunning index on the commits collection could provide the information about expired streams without needing a new dedicated streams collection, but I didn't get very far. |
Just started to work on some ideas here on a branch. If all we need to do is ensure that old events are deleted, then setting the collection level TTL should do the job. I've compared that to Event Store's TTL and it looks from the docs to work the same way. This reduces complexity of setting the stream metadata. We can tackle that if a strong requirement to have stream specific TTLs comes up and then look at efficient ways to reduce any RU impacts for this. Does that work for you? |
If I've understood you right ... I don't think collection level TTL will work (on the commits collection anyway) - it will partially delete streams (i.e. delete older events but leave newer events in place). We need to either delete all events in a stream, or none of them - I think. I had started to think too but this is very very draft as a different option:
I might be wrong to be trying to avoid a "streams" collection, I haven't fed that thinking back into the points above yet. It might replace the index. |
Why isn't partial deletions acceptable? Would you still be trying to read any of the older streams beyond a certain point in time? The simple way to avoid a streams collection is to have a "stream" document in the commits collection. That way you can apply changes in a transactionally consistent manner. However, this is engineering effort that if we can avoid I'd rather do so. What I'm proposing looks to be how this would work in Event Store too based on: |
Interesting thought that. I've realised the event store may not be used in an event sourcing context so I might be contaminating a pure store (SES) with a concern of a layer built on top of it (Event Sourcing). I was thinking that partial deletions would be OK as long as they don't surface as partial event streams to callers - i.e. as long as it stays an implementation-level detail. If a caller innocently requests a certain event stream, I had thought that they should receive the whole stream or none of it. The event objects should exist for the same period of time that the final application state they represent exists for. If, for example, the context is a DDD aggregate and it is replaying events to rebuild state, it would get in a horrible mess because it would only rebuild state partially. A decision it had earlier reached might be undone and different action taken, simply because some events had gone missing from its stream. I had thought that a stream should be append-only - I admit we're adding an exception by allowing the stream to be deleted, but at least we're not altering or corrupting the events inside a stream that still exists. If we were doing that, we'd probably need some sort of snapshot event capability before earlier events are deleted. Adding a stream document in at the start is a neat solution - the TTL index would always cause that to be deleted with the first batch of events to be delete. Potentially a stored procedure could be introduced to support reading from the stream, and that stored procedure could ensure that the stream document for the stream exists. If it doesn't, then the sproc returns no documents, even if a partial stream exists. That same principle could probably work without the stream document too - if the sproc makes sure that document with id :1 always exists, for example. |
BTW .. I think I'd be reluctant to leave a stream metadata document behind (e.g. having IsStreamDeleted=true). I know it's a tiny amount of storage, but I like the concept that the TTL literally caps storage needs - it completely removes the concept of storage consumption that just keeps growing. Still thinking about that though. |
So ... summarising my current feeling - I think that if we have no need to reflect back on deleted streams' metadata, I'm currently in favour of the collection-level TTL that you mention, and a read sproc (or C# code) that returns no events to the caller if the first event has been deleted. That allows partial deletes inside the storage engine, but doesn't surface them to the consumer. |
Do you have more of a need for the Stream document, and for consumers to check the IsStreamDeleted property before working with a stream's contents? Pants, I never answered one of your questions: no, we shouldn't be requesting streams past the TTL point, but if some accident happens and we do request the stream, I'd like SES to not return it after it has expired, rather than application code trying to process a stream with early events missing. It might look on the surface as if SES has a bug if events quietly start disappearing from streams too perhaps, whereas it's more obvious when the whole stream goes that the "stream TTL" is responsible. |
If you commit to using a TTL isn't it an intentional decision if events disappear? In terms of soft delete, we could represent this via a doc, but the simplest way to handle this would just be to put a "deleted" event onto the stream, that way you can process this via the change feed. Pretty sure I've read of approaches before where people have soft deleted for X period of time until the events/streams aren't used and then hard delete them. |
At API level, I'd like to commit to a TTL on a stream so all events disappear, not a TTL on individual events so just some of them disappear (I'm remembering A.Finance.Fraud :)). A TTL that hard deletes is fine (i.e. best) for our need. The above could be implemented using a TTL on the commits collection behind the scenes though, providing that implementation was invisible to the caller. I don't have need of a stream metadata document, but I'm happy to check a Boolean on such a document if that is preferred by everyone? We would make two API calls every time we loaded an event stream though, so possibly just one API method that returned both the stream metadata and the events would be good to provide opportunity to avoid multiple calls to CosmosDB under the hood. I didn't think this preference for a stream TTL over an event TTL was weird given the desired immutability of events within a stream, but if I'm alone in that view ignore me... |
Oh I get what you're saying ... we could soft delete ourselves using our own event, but then hard-delete with a TTL after that point in time. I still think it creates a strange unit testing demand on an aggregate, because we would need to test that it would not fail / have adverse side-effects when replaying event streams where the first parts of the stream had been deleted (lots of test cases too?), until it encountered the "soft delete" event. It's not as simple for the consumer as "whole stream has gone". Also, the challenge would be storing the soft event. We'd need a process that triggered/stored that event, e.g. x months after an interaction had stopped. |
Sorry, thought about this some more. A hard-delete TTL on the commits collection (i.e. on events not on streams) would work fine for us exactly as you originally proposed it. We have no need of the stream metadata document at the moment but it wouldn't cause us any problems if you did want to add it. We only retrieve entire streams so would detect a stream with a missing first event in our event sourcing library and surface it as stream-not-found to application code. I think it is neater if SES had the concept of a TTL on the entire stream, but if that is not really a good concern for the event store then our event sourcing abstraction on top of the store would be where we supported it. |
I've pushed an early version of this out to the supportTTL branch. Any feedback would be great. I'd want to add a few more tests just to make sure stream reads & writes aren't impacted before merging this in - can't see why they would be based on the existing implementation though. At present, the TTL would only be configured for new stores via the Initialize method - for existing event stores, we could handle this a few different ways:
Let me know what you think works. |
The changes in that branch look perfect to me, thanks for that. I'm happy to change TTL using a script outside of SES. As we're checking if a collection exists on initialise, it doesn't sound more costly to instead retrieve the collection configuration and check it (then create the collection if it doesn't exist, modify it if it does), However we'd have to manage (or at least test) the race condition between multiple consumers starting at the same time both trying to "correct" the TTL or other config if it has changed. Probably better to tackle that as a separate PR and then only if we find config changing often? |
Agree with you that the Initialize shouldn't be trying to change the config once applied. We could potentially provide some scripts for convenience, but in this instance it feels a little bit like overkill as this is just a standard feature of Cosmos. I'll add the additional tests and then push that into master. I'm aiming to do that by the end of the week at the latest. |
We have a need to delete streams after a time period that is agreed with the business. The driver is to cap CosmosDB costs by reducing the storage capacity requirement. As a result, some less-used services can change to single-partition collections, which significantly reduces their contribution to the total RU cost. Is it worth a discussion about a proposed approach, following which we could put a PR together?
The text was updated successfully, but these errors were encountered: