-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect metadata refresh when the seed broker used for refresh is stale and moved to another Kafka cluster #2637
Comments
@HaoSunUber 👋🏻 thanks for getting in touch. I'm certainly happy to review a PR that updates Sarama to prefer the least loaded existing broker connection when updating metadata. As you mention that brings us more in-line with the Java client, and is also more efficient if we re-use existing connections. Although the Java client doesn't currently fallback to the bootstrap addresses, I think it makes sense for Sarama to continue to do so Regarding your specific issue of accidentally fetching from a difference cluster though, I think we probably want to take heed of the advice given in KIP-899 (which is also exploring this space for the Java client) and "check the cluster ID [in any metadata response]" and "the client will fail if the returned cluster ID doesn't match the previously known one" — i.e., (as long as we're Version: 0_10_1_0 or newer) we should cache the cluster ID in the first metadata responses, and reject any that don't match that for the lifetime of the client |
Yes, we also patched the clusterID comparison logic internally when fixing this problem. If you think it is reasonable, I can do another PR for the clusterID validation change. I think the cluster ID is always static in the client lifecycle and should not have any degradation after introducing this validation |
…ta refresh IBM#2637 Seed brokers never change after client initialization. If the first seed broker became stale (still online, but moved to other Kafka cluster), Sarama client will use this stale broker to get the wrong metadata. To avoid using the stale broker to do metadata refresh, we will choose the least loaded broker in the cached broker list which is the similar the Java client implementation (https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L671-L736)
Seed brokers never change after client initialization. If the first seed broker became stale (still online, but moved to other Kafka cluster), Sarama client will use this stale broker to get the wrong metadata. To avoid using the stale broker to do metadata refresh, we will choose the least loaded broker in the cached broker list which is the similar the Java client implementation (https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L671-L736) Signed-off-by: Hao Sun <[email protected]>
Seed brokers never change after client initialization. If the first seed broker became stale (still online, but moved to other Kafka cluster), Sarama client will use this stale broker to get the wrong metadata. To avoid using the stale broker to do metadata refresh, we will choose the least loaded broker in the cached broker list which is the similar the Java client implementation (https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L671-L736) Signed-off-by: Hao Sun <[email protected]>
Seed brokers never change after client initialization. If the first seed broker became stale (still online, but moved to other Kafka cluster), Sarama client will use this stale broker to get the wrong metadata. To avoid using the stale broker to do metadata refresh, we will choose the least loaded broker in the cached broker list which is the similar the Java client implementation (https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L671-L736) Signed-off-by: Hao Sun <[email protected]>
Seed brokers never change after client initialization. If the first seed broker became stale (still online, but moved to other Kafka cluster), Sarama client may use this stale broker to get the wrong metadata. To avoid using the stale broker to do metadata refresh, we will choose the least loaded broker in the cached broker list which is the similar to how the Java client implementation works: https://github.com/apache/kafka/blob/7483991a/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L671-L736 Contributes-to: IBM#2637 Signed-off-by: Hao Sun <[email protected]>
Seed brokers never change after client initialization. If the first seed broker became stale (still online, but moved to other Kafka cluster), Sarama client may use this stale broker to get the wrong metadata. To avoid using the stale broker to do metadata refresh, we will choose the least loaded broker in the cached broker list which is the similar to how the Java client implementation works: https://github.com/apache/kafka/blob/7483991a/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L671-L736 Contributes-to: IBM#2637 Signed-off-by: Hao Sun <[email protected]>
thanks. Code merged. Closed |
Description
Hi, my name is Hao Sun from the Uber Kafka team. We plan to report an issue with the Sarama metadata refresh.
Context about the Uber instance lifecycle management:
At Uber, our Kafka broker instance will be automatically replaced when there is a host hardware issue. Afterward, this replaced instance with the same IP address will sometimes be reused in another Kafka cluster and never belong to the original Kafka cluster.
Stale broker Metadata Refresh Issue
In our current Sarama metadata refresh logic, we will use the first available broker in the seed broker list to fetch the metadata. This seed broker list never gets changed after client initialization.
Unfortunately, the metadata from broker2 isn't from the right cluster. All cached metadata info in the client is messed up. When we use this client to produce/consume, we will encounter the topic partition error. The more dangerous case is if two Kafka cluster has topics with the same name, we are more likely to produce/consume the wrong topics from the wrong cluster.
Proposed fix:
We propose to change the behavior of how to pick up a broker to do metadata refresh. Instead of using the static seed broker list, we will pick up the least loaded broker from the cached broker list to refresh the metadata. If the cached broker list is empty, we will use the first broker in the seed broker list as we are doing now. We already have this idea patch internally and hope to create a PR. This fix idea is the same as the open-source Java Kafka client. The seed broker list is the bootstrap list only used during startup. Each metadata refresh will pick up the least loaded broker from the cached broker list. Java source code.
Let me know what you think. If it looks good, I will create a PR for review.
Thanks.
Hao
Versions
Basic client behavior changes for all Sarama/Kafka/Go versions.
The text was updated successfully, but these errors were encountered: