-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Producer] Discovery: Sad path for event production #54
Comments
Some discussion of handling shutdown of producer can be found: openedx/event-bus-kafka#11 |
Useful suggestions: https://www.confluent.io/blog/error-handling-patterns-in-kafka/ |
At the very least, we're probably going to have to consider three different types of issues:
1/2 are probably harder, since really the only safe thing to do is to have some sort of persistent storage (like a DB table) that keeps track of events that happened but never made it to the event bus at all. How this is implemented would probably vary wildly between services. 3 is probably where we can actually be the most helpful, and where things like retry and DLQ topics come in. This could also be an iterative process that could be changed as we get more concrete use cases. For example, one very very simple first shot would be to create a DLQ topic alongside every actual topic. The DLQ topic would have to have very open permissions and a super loose schema (maybe something like {'source':<where I'm coming from>, 'event_key_as_string':a_long_string, 'event_value_as_string':a_very_long_string'}. |
For 1/2, we might be able to help out with a standardized log format that would at least allow anyone to grep for these kinds of events and be confident that they found all of them in Splunk. It's not great but it's a step. |
Ex of a first shot at 3: openedx/event-bus-kafka#43 |
I will do more of a review of these notes and PRs, but just wanted to write some notes that have been bouncing around in my brain. Capturing (potentially premature) assorted thoughts:
|
More thoughts: :)
|
Closing this while we're going with logging |
Discovery for error handling of event production may result in an implementation (or POC branches), and/or documentation and further ticketing.
We want to explore the space of 'things that could go wrong with event production' and this ticket for enumerating the ones we know about.
Questions:
The text was updated successfully, but these errors were encountered: