Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust paritioning / hash behavior to comply with java lib #1114

Closed
wants to merge 2 commits into from

Conversation

mier85
Copy link
Contributor

@mier85 mier85 commented Jun 12, 2018

So we had a problem here with java and go services working together. Even though they are using the same hash function, the behavior afterwards is slightly different.

While java is doing this:
Abs(Hash) % numPartitions

Sarama was doing this:
Abs(Hash % numPartitions

I updated that according to how the java lib is doing it :

https://github.com/apache/kafka/blob/fa1d0383902260576132e09bdf9efcc2784b55b4/clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java#L69

https://github.com/apache/kafka/blob/fa1d0383902260576132e09bdf9efcc2784b55b4/clients/src/main/java/org/apache/kafka/common/utils/Utils.java#L821

@eapache
Copy link
Contributor

eapache commented Jun 14, 2018

Huh, I'm a bit confused. We actually used to do the abs check first, but that was causing issues with some hash values so we had to change it (#709).

I think this is OK because it's doing a bitwise abs instead of a logical one, but I'm not sure if that ends up affecting the normalness of the distribution at the boundary?

@mier85
Copy link
Contributor Author

mier85 commented Jun 15, 2018

Yes, because of the bitwise operation, the problem described in #709 will not happen again and the tests for it are green. The problem with that was with the max integer values.

Even though this modification halves the basic set, it shouldn't affect the even total distribution amongst the partitions.

On the other hand, since the implementation differs in the official client and presumably other languages, the partitioning doesn't match which leads to other problems which has a more grave effect on how the whole partitioning works and leads to inconsistency.

@eapache
Copy link
Contributor

eapache commented Jun 15, 2018

Makes sense, I agree this is a worthwhile change. I'm wondering if it should be shipped as a new partitioner though, in case there are users depending on the old implementation now without realizing it (e.g. because they never used the official client at all).

@mier85
Copy link
Contributor Author

mier85 commented Jun 15, 2018

Making it an own partitioner would make sense for making a non-breaking migration possible even though, naturally, for us it means that we have to use this new hash function in every service.

Only other possible solution would be adjusting the version major numbers.

@eapache
Copy link
Contributor

eapache commented Jun 15, 2018

Thanks for reporting this, 6967cdb adds a NewReferenceHashPartitioner based on this PR.

@eapache eapache closed this Jun 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants