Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserves ordering of exported records #39

Merged
merged 1 commit into from
Oct 3, 2020
Merged

Conversation

npepinpe
Copy link
Collaborator

Description

This PR implements a new Partitioner which assigns a consistent Kafka partition for all Zeebe records which share the same Zeebe partition, thereby preserving the ordering in which they were written in Zeebe (for a given Kafka topic).

The current approach is suboptimal - it simply maps the zeebePartitionId using a simple modulo over the count of Kafka partitions for a given topic. This means if there are more Kafka partitions than Zeebe partitions, some of these partitions will be unused - however it solves an important issue, so it's a good interim solution until a better one is found.

Related issues

closes #31

@npepinpe npepinpe self-assigned this Sep 26, 2020
@npepinpe
Copy link
Collaborator Author

/cc @tjwp @cameronbraid Let me know if this solution would not solve your issue

@npepinpe npepinpe added the bug Something isn't working label Sep 26, 2020
@npepinpe
Copy link
Collaborator Author

I thought as well about what Cameron implemented - using the workflow instance key, and falling back to the partition ID. Some issues I had with it:

  1. Not all RecordValue implement WorkflowInstanceRelated, even they are - I opened an issue for that on the Zeebe side
  2. Some records have an effect on workflow instances but will not carry the workflow instance key - while that's probably fine for most users who only care about workflow instance related events, it means you may lose the chain of causality, so you cannot expect that the stuff you get will be exactly the same.
  3. You end up with many records with the same key, which means you cannot use log compaction (though again arguable if you'd want to use it anyway with the exporter).

I'm thinking maybe to have both partitioners and allow users to switch. So, for example, if you're anyway only exporting workflow instance related stuff (e.g. configured your exporter to only export workflowInstance events), then the partitioner which uses the workflow instance key will work fine, and you will be able to use more Kafka partitions than Zeebe partitions.

@npepinpe npepinpe merged commit aea7ccb into master Oct 3, 2020
@npepinpe npepinpe deleted the 31-key-partitioner branch October 3, 2020 15:35
@siddartha-ps
Copy link

As mentioned above; is there any plan to have a partitioner implementation which uses workflow/process instance key instead of the partition ID; so that users can switch either to the RecordID/Partition ID Partitioner implementation (current one) or the Workflow/process instance key Partitioner?
We have more kafka partitions than zeebe partitions and would like to utilize all the available kafka partitions; but facing limitations due to current RecordID/Partition ID Partitioner implementation in zeebe kafka exporter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggestion regarding kafka keys
2 participants