-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] [broker] Fix wrong logic of method TopicName.getPartition(int index) #19841
[fix] [broker] Fix wrong logic of method TopicName.getPartition(int index) #19841
Conversation
8cdcb15
to
8a7d584
Compare
This fix could solve #18149 |
Yes, I've seen the issue, and I'm sure it's the same one |
/pulsarbot rerun-failure-checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As most cases suffer loss from the DLQ topic name, can we just avoid the topic name with -partition-
and -DLQ
keyword run into the infinite loop? So that we will not break the current behavior.
In other words, we will not automatically create a partitioned topic for the DLQ topic even if users are enabled partitioned topic auto-creation. If users really need a partitioned topic for DLQ, they should create it themselves. IMO, it should be a misuse of DLQ, users should not set a high throughput expected to a DLQ. That doesn't make sense for most real cases.
@@ -231,7 +231,7 @@ public String getEncodedLocalName() { | |||
} | |||
|
|||
public TopicName getPartition(int index) { | |||
if (index == -1 || this.toString().contains(PARTITIONED_TOPIC_SUFFIX)) { | |||
if (index == -1 || this.toString().endsWith(PARTITIONED_TOPIC_SUFFIX + index)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I thought was the PR intent to allow the partitioned name can contain -partition-
keyword to resolve the issue. But it will introduce a new unreasonable rule to the partitioned topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will not automatically create a partitioned topic for the DLQ topic even if users are enabled partitioned topic auto-creation.
Yes, PIP-263: Just auto-create no-partitioned DLQ And Prevent auto-create a DLQ for a DLQ try to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I thought was the PR intent to allow the partitioned name can contain -partition- keyword to resolve the issue. But it will introduce a new unreasonable rule to the partitioned topic.
I think there are three separate things:
- Our rule for a partition name and how to check which index it is: end with
-partition-x
or contain-partition-x
- The current logic is incorrect, and the current PR can fix it.
- Whether to allow to create a topic whose name contains
-paritition-
- the behavior is denied for Admin API
-the behavior is allowed for client(such as create by a cmd-subscribe), you can see the teststestInfiniteHttpCallGetSubscriptions2
,testInfiniteHttpCallGetSubscriptions3
in current PR
- the behavior is denied for Admin API
- Whether only create non-partitioned topics for the Dead Letter or Retry Letter
I have submitted a quick fix for this issue. Please take a look. @Technoboy- @codelipenghui |
e6f54b6
to
73cb806
Compare
Codecov Report
@@ Coverage Diff @@
## master #19841 +/- ##
=========================================
Coverage 72.93% 72.93%
- Complexity 31934 31963 +29
=========================================
Files 1868 1868
Lines 138449 138458 +9
Branches 15235 15236 +1
=========================================
+ Hits 100978 100991 +13
Misses 29438 29438
+ Partials 8033 8029 -4
Flags with carried forward coverage won't be shown. Click here to find out more.
|
…ptions caused by wrong topicName (#21997) Similar to: #20131 The master branch has fixed the issue by #19841 Since it will makes users can not receive the messages which created in mistake, we did not cherry-pick #19841 into other branches, see detail #19841) ### Motivation #### Background of Admin API `PersistentTopics.createSubscription` It works like this: 1. createSubscription( `tp1` ) 2. is partitioned topic? `no`: return subscriptions `yes`: createSubscription(`tp1-partition-0`)....createSubscription(`tp1-partition-n`) --- #### Background of the issue of `TopicName.getPartition(int index)` ```java String partitionedTopic = "tp1-partition-0-DLQ"; TopicName partition0 = partitionedTopic.getPartition(0);// Highlight: the partition0.toString() will be "tp1-partition-0-DLQ"(it is wrong).The correct value is "tp1-partition-0-DLQ-partition-0" ``` #### Issue Therefore, if there has a partitioned topic named `tp1-partition-0-DLQ`, the method `PersistentTopics.createSubscription` will works like this: 1. call Admin API ``PersistentTopics.createSubscription("tp1-partition-0-DLQ")` 2. is partitioned topic? 3. yes, call `TopicName.getPartition(0)` to get partition 0 and will get `tp1-partition-0-DLQ` , then loop to step-1. Then the infinite HTTP call `PersistentTopics.createSubscription` makes the broker crash. ### Modifications #### Quick fix(this PR does it) If hits the issue which makes the topic name wrong, do not loop to step 1. #### Long-term fix The PR #19841 fixes the issue which makes the topic name wrong, and this PR will create unfriendly compatibility, and PIP 263 #20033 will make compatibility good.
…ptions caused by wrong topicName (apache#21997) Similar to: apache#20131 The master branch has fixed the issue by apache#19841 Since it will makes users can not receive the messages which created in mistake, we did not cherry-pick apache#19841 into other branches, see detail apache#19841) It works like this: 1. createSubscription( `tp1` ) 2. is partitioned topic? `no`: return subscriptions `yes`: createSubscription(`tp1-partition-0`)....createSubscription(`tp1-partition-n`) --- ```java String partitionedTopic = "tp1-partition-0-DLQ"; TopicName partition0 = partitionedTopic.getPartition(0);// Highlight: the partition0.toString() will be "tp1-partition-0-DLQ"(it is wrong).The correct value is "tp1-partition-0-DLQ-partition-0" ``` Therefore, if there has a partitioned topic named `tp1-partition-0-DLQ`, the method `PersistentTopics.createSubscription` will works like this: 1. call Admin API ``PersistentTopics.createSubscription("tp1-partition-0-DLQ")` 2. is partitioned topic? 3. yes, call `TopicName.getPartition(0)` to get partition 0 and will get `tp1-partition-0-DLQ` , then loop to step-1. Then the infinite HTTP call `PersistentTopics.createSubscription` makes the broker crash. If hits the issue which makes the topic name wrong, do not loop to step 1. The PR apache#19841 fixes the issue which makes the topic name wrong, and this PR will create unfriendly compatibility, and PIP 263 apache#20033 will make compatibility good. (cherry picked from commit 4386401)
…ptions caused by wrong topicName (apache#21997) Similar to: apache#20131 The master branch has fixed the issue by apache#19841 Since it will makes users can not receive the messages which created in mistake, we did not cherry-pick apache#19841 into other branches, see detail apache#19841) It works like this: 1. createSubscription( `tp1` ) 2. is partitioned topic? `no`: return subscriptions `yes`: createSubscription(`tp1-partition-0`)....createSubscription(`tp1-partition-n`) --- ```java String partitionedTopic = "tp1-partition-0-DLQ"; TopicName partition0 = partitionedTopic.getPartition(0);// Highlight: the partition0.toString() will be "tp1-partition-0-DLQ"(it is wrong).The correct value is "tp1-partition-0-DLQ-partition-0" ``` Therefore, if there has a partitioned topic named `tp1-partition-0-DLQ`, the method `PersistentTopics.createSubscription` will works like this: 1. call Admin API ``PersistentTopics.createSubscription("tp1-partition-0-DLQ")` 2. is partitioned topic? 3. yes, call `TopicName.getPartition(0)` to get partition 0 and will get `tp1-partition-0-DLQ` , then loop to step-1. Then the infinite HTTP call `PersistentTopics.createSubscription` makes the broker crash. If hits the issue which makes the topic name wrong, do not loop to step 1. The PR apache#19841 fixes the issue which makes the topic name wrong, and this PR will create unfriendly compatibility, and PIP 263 apache#20033 will make compatibility good. (cherry picked from commit 4386401)
…ptions caused by wrong topicName (apache#21997) Similar to: apache#20131 The master branch has fixed the issue by apache#19841 Since it will makes users can not receive the messages which created in mistake, we did not cherry-pick apache#19841 into other branches, see detail apache#19841) It works like this: 1. createSubscription( `tp1` ) 2. is partitioned topic? `no`: return subscriptions `yes`: createSubscription(`tp1-partition-0`)....createSubscription(`tp1-partition-n`) --- ```java String partitionedTopic = "tp1-partition-0-DLQ"; TopicName partition0 = partitionedTopic.getPartition(0);// Highlight: the partition0.toString() will be "tp1-partition-0-DLQ"(it is wrong).The correct value is "tp1-partition-0-DLQ-partition-0" ``` Therefore, if there has a partitioned topic named `tp1-partition-0-DLQ`, the method `PersistentTopics.createSubscription` will works like this: 1. call Admin API ``PersistentTopics.createSubscription("tp1-partition-0-DLQ")` 2. is partitioned topic? 3. yes, call `TopicName.getPartition(0)` to get partition 0 and will get `tp1-partition-0-DLQ` , then loop to step-1. Then the infinite HTTP call `PersistentTopics.createSubscription` makes the broker crash. If hits the issue which makes the topic name wrong, do not loop to step 1. The PR apache#19841 fixes the issue which makes the topic name wrong, and this PR will create unfriendly compatibility, and PIP 263 apache#20033 will make compatibility good. (cherry picked from commit 4386401)
…ptions caused by wrong topicName (apache#21997) Similar to: apache#20131 The master branch has fixed the issue by apache#19841 Since it will makes users can not receive the messages which created in mistake, we did not cherry-pick apache#19841 into other branches, see detail apache#19841) It works like this: 1. createSubscription( `tp1` ) 2. is partitioned topic? `no`: return subscriptions `yes`: createSubscription(`tp1-partition-0`)....createSubscription(`tp1-partition-n`) --- ```java String partitionedTopic = "tp1-partition-0-DLQ"; TopicName partition0 = partitionedTopic.getPartition(0);// Highlight: the partition0.toString() will be "tp1-partition-0-DLQ"(it is wrong).The correct value is "tp1-partition-0-DLQ-partition-0" ``` Therefore, if there has a partitioned topic named `tp1-partition-0-DLQ`, the method `PersistentTopics.createSubscription` will works like this: 1. call Admin API ``PersistentTopics.createSubscription("tp1-partition-0-DLQ")` 2. is partitioned topic? 3. yes, call `TopicName.getPartition(0)` to get partition 0 and will get `tp1-partition-0-DLQ` , then loop to step-1. Then the infinite HTTP call `PersistentTopics.createSubscription` makes the broker crash. If hits the issue which makes the topic name wrong, do not loop to step 1. The PR apache#19841 fixes the issue which makes the topic name wrong, and this PR will create unfriendly compatibility, and PIP 263 apache#20033 will make compatibility good. (cherry picked from commit 4386401)
…ptions caused by wrong topicName (apache#21997) Similar to: apache#20131 The master branch has fixed the issue by apache#19841 Since it will makes users can not receive the messages which created in mistake, we did not cherry-pick apache#19841 into other branches, see detail apache#19841) It works like this: 1. createSubscription( `tp1` ) 2. is partitioned topic? `no`: return subscriptions `yes`: createSubscription(`tp1-partition-0`)....createSubscription(`tp1-partition-n`) --- ```java String partitionedTopic = "tp1-partition-0-DLQ"; TopicName partition0 = partitionedTopic.getPartition(0);// Highlight: the partition0.toString() will be "tp1-partition-0-DLQ"(it is wrong).The correct value is "tp1-partition-0-DLQ-partition-0" ``` Therefore, if there has a partitioned topic named `tp1-partition-0-DLQ`, the method `PersistentTopics.createSubscription` will works like this: 1. call Admin API ``PersistentTopics.createSubscription("tp1-partition-0-DLQ")` 2. is partitioned topic? 3. yes, call `TopicName.getPartition(0)` to get partition 0 and will get `tp1-partition-0-DLQ` , then loop to step-1. Then the infinite HTTP call `PersistentTopics.createSubscription` makes the broker crash. If hits the issue which makes the topic name wrong, do not loop to step 1. The PR apache#19841 fixes the issue which makes the topic name wrong, and this PR will create unfriendly compatibility, and PIP 263 apache#20033 will make compatibility good. (cherry picked from commit 4386401)
@poorbarcode The test PartitionKeywordCompatibilityTest is broken in branch-3.0 . Cherry-picking this PR fixes the issue. Please let me know if you have concerns about cherry-picking this PR to branch-3.0. |
https://github.com/apache/pulsar/pull/19841/files#diff-93feae220f18ea80cb01ec7f2cefeab410aa0f3f181eed1206da4a294db0f701R233-R234 This changed the behavior of getPartition(index) , which will choose different topic than before, it may cause the data in the original topic can not be consumed |
@poorbarcode In which case would this apply? |
See the section 4 in the Motivation |
@poorbarcode Since #22705 has been cherry-picked to branch-3.0, does it prevent the problem from occurring? |
Fixes: #18149
Motivation
The method
TopicName.getPartition(int index)
has a bug which misidentifies "tp-partition-0-DLQ-partition-0" as "tp-partition-0-DLQ".Modifications
Fix the wrong logic of the method
TopicName.getPartition(int index)
(Highlight) Note: New versions containing this patch cannot be upgraded smoothly
1. If users use the feature DLQ and enable topic auto-creation, the DLQ will be auto-created. For example:
These two topics will be auto-created:
primary-tp-partition-0-sub1-RETRY
2. And if users set the rule of the topic auto-creation like this:
Then will create two partitioned topics will be created:
primary-tp-partition-0-sub1-RETRY
primary-tp-partition-0-sub1-RETRY-partition-0
primary-tp-partition-0-sub1-RETRY-partition-1
3. (Highlight)And if hit current issue:
Then will create two partitioned topics will be created:
primary-tp-partition-0-sub1-RETRY
primary-tp-partition-0-sub1-RETRY
Just show an example:
primary-tp.
primary-tp-partition-0.
primary-tp-partition-0-sub1-DLQ.
primary-tp-partition-0-sub1-DLQ-partition-0
andprimary-tp-partition-0-sub1-DLQ-partition-1
4. What happens when you hit the issue and upgrade to the new version
There have retry topics as section-3 before the upgrade, and consumers consume messages from the topic
primary-tp-partition-0-sub1-RETRY
.These two partitions below will be created, and consumers will consume messages from these two topics below
primary-tp-partition-0-sub1-RETRY-partition-0
primary-tp-partition-0-sub1-RETRY-partition-1
But the messages in the original topic
primary-tp-partition-0-sub1-RETRY
will not be consumed anymore. It seems some messages were lost.5. Workaround
Delete the ZK node
primary-tp-partition-0-sub1-RETRY
before the upgrade. Pulsar considers this topic to be non-partitioned, then the partitions will not be created after the upgrade.If users hits the current issue before
then the partition of the DLQ whose named
*-partition-*-partition-*
hits this issue.6. (Highlight) Whether need a multi-partition DLQ/Retry Topic
See PIP 263:
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: