Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sarama committed offsets don't match java consumer offsets #705

Closed
kanekv opened this issue Jul 15, 2016 · 5 comments
Closed

sarama committed offsets don't match java consumer offsets #705

kanekv opened this issue Jul 15, 2016 · 5 comments

Comments

@kanekv
Copy link

kanekv commented Jul 15, 2016

When we consume everything from kafka topic, kafka's tool (bin/kafka-consumer-groups.sh) still shows lag of 1 message. If we run java consumer with the same group it consumes 1 remaining message (which was already consumed by sarama consumer).

Does sarama use own offset schema (one less than java)?

Versions

Sarama Version: master
Kafka Version: 0.10
Go Version: 1.6.2

@eapache
Copy link
Contributor

eapache commented Jul 15, 2016

What are you using to commit offsets? Sarama does not commit offsets by default.

@kanekv
Copy link
Author

kanekv commented Jul 15, 2016

Using PartitionOffsetManager, committing manually.

@eapache
Copy link
Contributor

eapache commented Jul 23, 2016

Huh, dug into this a bit more and found https://github.com/apache/kafka/blob/fa32545442ef6724aa9fb5f4e0e269b0c873288f/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L287-L288.

So in Java-land, you are expected to commit last-message-offset + 1 and then start from that. In sarama we always assumed that you would commit last-message-offset directly and then add +1 to get your next starting point (see https://github.com/Shopify/sarama/blob/366453c3e32f055595f4d5269d8f39aac5fdcdbb/offset_manager.go#L343).

I'm not sure what the correct solution is. We probably shouldn't be adding one in NextOffset - should we instead be adding one in MarkOffset or should we just document it and leave that to the user?

@kanekv
Copy link
Author

kanekv commented Jul 23, 2016

@eapache I think we should follow what java consumer does, that would potentially allow to mix/interchange consumers, also kafka's built-in reporting tools would look better.

eapache added a commit that referenced this issue Jul 26, 2016
Upstream requires you mark last-consumed+1 and then returns that value directly.
We were requiring you mark last-consumed and then adding one to the returned
value.

Match upstream's behaviour so that our offset tracking is interoperable.

Fixes #705.
eapache added a commit that referenced this issue Jul 26, 2016
Upstream requires you mark last-consumed+1 and then returns that value directly.
We were requiring you mark last-consumed and then adding one to the returned
value.

Match upstream's behaviour so that our offset tracking is interoperable.

Fixes #705.
@dtjm
Copy link

dtjm commented Jul 29, 2016

I think it might be worthwhile to note that this is somewhat of a major API change. If someone recompiles their Kafka consumer with this patch and restarts it, they would end up with 1 duplicate message per partition.

Nevermind, this is already discussed in #713.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants