Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Producer] Make sure we don't lose events on producer shutdown #11

Open
timmc-edx opened this issue Jul 27, 2022 · 3 comments
Open

[Producer] Make sure we don't lose events on producer shutdown #11

timmc-edx opened this issue Jul 27, 2022 · 3 comments
Assignees
Labels
event-bus Work related to the Event Bus.

Comments

@timmc-edx
Copy link
Contributor

timmc-edx commented Jul 27, 2022

As outlined in #10 we should make sure to flush out any remaining unsent batch of events when the Django server is shutting down.

  • Call flush() at shutdown, to drain buffer
  • Make sure that linger.ms, batch.size, and any librdkafka queue settings are set so that messages are sent more or less immediately, for cases when the server shuts down unexpectedly

Reference:

Acceptance Criteria

  1. Make a proposal about what we should do.
  2. If it's easy, maybe submit a PR.
  3. If it's not, make a new ticket with details and updated acceptance criteria.
@timmc-edx timmc-edx added event-bus Work related to the Event Bus. backlog Item is on a team's backlog or wish list. labels Jul 27, 2022
@timmc-edx timmc-edx added this to the [Event Bus] Future milestone Jul 27, 2022
@timmc-edx timmc-edx moved this to Todo in Arch-BOM Jul 27, 2022
@robrap robrap changed the title Make sure we don't lose events on producer shutdown [Sad] Make sure we don't lose events on producer shutdown Aug 9, 2022
@robrap robrap changed the title [Sad] Make sure we don't lose events on producer shutdown Make sure we don't lose events on producer shutdown Aug 9, 2022
@dianakhuang dianakhuang removed the backlog Item is on a team's backlog or wish list. label Aug 25, 2022
@robrap robrap changed the title Make sure we don't lose events on producer shutdown [Producer] Make sure we don't lose events on producer shutdown Aug 25, 2022
@dianakhuang dianakhuang self-assigned this Aug 25, 2022
timmc-edx added a commit that referenced this issue Aug 26, 2022
Purpose:

- Revisit #16 since I
  finally figured out a clean way to have a single producer.
- Reduce the burden on future code that will need to adjust how polling
  is done (#31) and
  maybe handle shutdown (#11)
- Prepare for configurable implementation loading, which will need a
  singleton and getter: openedx/openedx-events#87
- Get rid of the `sync` argument (which didn't fit the abstraction) and
  move it to a dedicated method.

Relying code should now call `get_producer().send(...)` rather than
`send_to_event_bus(...)`. The return value is an object that wraps a
`Producer` instance (not a `SerializingProducer`) and that handles the
serialization itself.

Serialization logic is moved to a cached `get_serializers(...)` that
expands upon the previous `get_serializer` function; it now returns a pair
of key and value serializers. This also acts as a patch point for mocking.

`send_to_event_bus` gets a shorter name (now it's just a `send` method)
and loses the `sync` keyword argument; there is instead now a
`pre_shutdown` method.
timmc-edx added a commit that referenced this issue Aug 26, 2022
Purpose:

- Revisit #16 since I
  finally figured out a clean way to have a single producer.
- Reduce the burden on future code that will need to adjust how polling
  is done (#31) and
  maybe handle shutdown (#11)
- Prepare for configurable implementation loading, which will need a
  singleton and getter: openedx/openedx-events#87
- Get rid of the `sync` argument (which didn't fit the abstraction) and
  move it to a dedicated method.

Relying code should now call `get_producer().send(...)` rather than
`send_to_event_bus(...)`. The return value is an object that wraps a
`Producer` instance (not a `SerializingProducer`) and that handles the
serialization itself.

Serialization logic is moved to a cached `get_serializers(...)` that
expands upon the previous `get_serializer` function; it now returns a pair
of key and value serializers. This also acts as a patch point for mocking.

`send_to_event_bus` gets a shorter name (now it's just a `send` method)
and loses the `sync` keyword argument; there is instead now a
`pre_shutdown` method.
timmc-edx added a commit that referenced this issue Aug 26, 2022
Purpose:

- Revisit #16 since I
  finally figured out a clean way to have a single producer.
- Reduce the burden on future code that will need to adjust how polling
  is done (#31) and
  maybe handle shutdown (#11)
- Prepare for configurable implementation loading, which will need a
  singleton and getter: openedx/openedx-events#87
- Get rid of the `sync` argument (which didn't fit the abstraction) and
  move it to a dedicated method.

Relying code should now call `get_producer().send(...)` rather than
`send_to_event_bus(...)`. The return value is an object that wraps a
`Producer` instance (not a `SerializingProducer`) and that handles the
serialization itself.

Serialization logic is moved to a cached `get_serializers(...)` that
expands upon the previous `get_serializer` function; it now returns a pair
of key and value serializers. This also acts as a patch point for mocking.

`send_to_event_bus` gets a shorter name (now it's just a `send` method)
and loses the `sync` keyword argument; there is instead now a
`pre_shutdown` method.
@dianakhuang dianakhuang moved this from Todo to Groomed in Arch-BOM Aug 26, 2022
timmc-edx added a commit that referenced this issue Aug 26, 2022
Purpose:

- Revisit #16 since I
  finally figured out a clean way to have a single producer.
- Reduce the burden on future code that will need to adjust how polling
  is done (#31) and
  maybe handle shutdown (#11)
- Prepare for configurable implementation loading, which will need a
  singleton and getter: openedx/openedx-events#87
- Get rid of the `sync` argument (which didn't fit the abstraction) and
  move it to a dedicated method.

Relying code should now call `get_producer().send(...)` rather than
`send_to_event_bus(...)`. The return value is an object that wraps a
`Producer` instance (not a `SerializingProducer`) and that handles the
serialization itself.

Serialization logic is moved to a cached `get_serializers(...)` that
expands upon the previous `get_serializer` function; it now returns a pair
of key and value serializers. This also acts as a patch point for
mocking. I'd like to test the serializers themselves, but they want to
talk to a server.

`send_to_event_bus` gets a shorter name (now it's just a `send` method)
and loses the `sync` keyword argument; there is instead now a
`pre_shutdown` method.
timmc-edx added a commit that referenced this issue Aug 31, 2022
Purpose:

- Revisit #16 since I
  finally figured out a clean way to have a single producer.
- Reduce the burden on future code that will need to adjust how polling
  is done (#31) and
  maybe handle shutdown (#11)
- Prepare for configurable implementation loading, which will need a
  singleton and getter: openedx/openedx-events#87
- Get rid of the `sync` argument (which didn't fit the abstraction) and
  move it to a dedicated method.

Relying code should now call `get_producer().send(...)` rather than
`send_to_event_bus(...)`. The return value is an object that wraps a
`Producer` instance (not a `SerializingProducer`) and that handles the
serialization itself.

Serialization logic is moved to a cached `get_serializers(...)` that
expands upon the previous `get_serializer` function; it now returns a pair
of key and value serializers. This also acts as a patch point for
mocking. I'd like to test the serializers themselves, but they want to
talk to a server.

`send_to_event_bus` gets a shorter name (now it's just a `send` method)
and loses the `sync` keyword argument; there is instead now a
`prepare_for_shutdown` method.

Other refactoring:

- Cache `create_schema_registry_client` and rename to `get_...`
- Lift producer test data to be instance variables
@timmc-edx
Copy link
Contributor Author

timmc-edx commented Sep 22, 2022

While batch.size defaults to 16 kB, linger.ms defaults to zero, so messages should always be sent immediately. It is also likely that servers will be taken out of the load balancer and allowed to drain before shutdown, and that period will significantly exceed during of a request-handling and the few milliseconds required to send an event.

I also spent a while looking into ways to perform an action on shutdown and didn't find anything very promising. We could ask gunicorn to make a call, and we might be able to listen for a certain signal? But there's nothing that's obviously portable.

Given that, I think we can skip calling flush.

@dianakhuang dianakhuang assigned timmc-edx and unassigned dianakhuang Sep 28, 2022
@timmc-edx timmc-edx moved this from Groomed to In Progress in Arch-BOM Sep 28, 2022
@timmc-edx timmc-edx moved this from In Progress to Todo in Arch-BOM Oct 14, 2022
@timmc-edx
Copy link
Contributor Author

timmc-edx commented Oct 14, 2022

Deprioritizing this until we learn that this needs addressing. May just want to leave an ADR.

May also want to consider the celery worker case (workers get destroyed after a certain number of tasks.)

@timmc-edx
Copy link
Contributor Author

We should probably just switch to using the Outbox pattern so we can do away with all these issues with shutdown timing, logging to allow re-production, etc.

@robrap robrap removed the status in Arch-BOM Mar 23, 2023
@robrap robrap removed the backlog Item is on a team's backlog or wish list. label Apr 20, 2023
@robrap robrap removed this from the [Event Bus] Future milestone Jun 8, 2023
@jristau1984 jristau1984 moved this to Backlog in Arch-BOM Jul 1, 2024
@timmc-edx timmc-edx moved this from Backlog to Done in Arch-BOM Jul 18, 2024
@timmc-edx timmc-edx closed this as completed by moving to Done in Arch-BOM Jul 18, 2024
@timmc-edx timmc-edx moved this from Done to Backlog in Arch-BOM Jul 18, 2024
@timmc-edx timmc-edx reopened this Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
event-bus Work related to the Event Bus.
Projects
None yet
Development

No branches or pull requests

3 participants