Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [broker] Fix wrong logic of method TopicName.getPartition(int index) #19841

Merged
merged 5 commits into from
Apr 23, 2023

Conversation

poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented Mar 16, 2023

Fixes: #18149

Motivation

The method TopicName.getPartition(int index) has a bug which misidentifies "tp-partition-0-DLQ-partition-0" as "tp-partition-0-DLQ".

Modifications

Fix the wrong logic of the method TopicName.getPartition(int index)


(Highlight) Note: New versions containing this patch cannot be upgraded smoothly

1. If users use the feature DLQ and enable topic auto-creation, the DLQ will be auto-created. For example:

Consumer<String> consumer = pulsarClient.newConsumer(Schema.STRING)
                .topic("primary-tp-partition-0")
                .subscriptionName("sub1")
                .enableRetry(true)
                .subscribe();

These two topics will be auto-created:

  • primary-tp-partition-0-sub1-RETRY

2. And if users set the rule of the topic auto-creation like this:

allowAutoTopicCreationType=partitioned
defaultNumPartitions=2

Then will create two partitioned topics will be created:

  • primary-tp-partition-0-sub1-RETRY
    • primary-tp-partition-0-sub1-RETRY-partition-0
    • primary-tp-partition-0-sub1-RETRY-partition-1

3. (Highlight)And if hit current issue:

Then will create two partitioned topics will be created:

  • primary-tp-partition-0-sub1-RETRY
    • primary-tp-partition-0-sub1-RETRY

Just show an example:

  • there has a partitioned topic named primary-tp.
  • so there will have a partition named primary-tp-partition-0.
  • if enabled the feature DLQ, we get DLQ primary-tp-partition-0-sub1-DLQ.
  • and if the DLQ is a partitioned topic, we get two partitions: primary-tp-partition-0-sub1-DLQ-partition-0 and primary-tp-partition-0-sub1-DLQ-partition-1

4. What happens when you hit the issue and upgrade to the new version

There have retry topics as section-3 before the upgrade, and consumers consume messages from the topic primary-tp-partition-0-sub1-RETRY.

These two partitions below will be created, and consumers will consume messages from these two topics below

  • primary-tp-partition-0-sub1-RETRY-partition-0
  • primary-tp-partition-0-sub1-RETRY-partition-1

But the messages in the original topic primary-tp-partition-0-sub1-RETRY will not be consumed anymore. It seems some messages were lost.

5. Workaround

Delete the ZK node primary-tp-partition-0-sub1-RETRY before the upgrade. Pulsar considers this topic to be non-partitioned, then the partitions will not be created after the upgrade.

If users hits the current issue before

  • there is a partitioned topic named

then the partition of the DLQ whose named *-partition-*-partition-* hits this issue.

6. (Highlight) Whether need a multi-partition DLQ/Retry Topic

See PIP 263:

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@poorbarcode poorbarcode self-assigned this Mar 16, 2023
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Mar 16, 2023
@poorbarcode poorbarcode added release/2.10.5 release/2.11.1 type/bug The PR fixed a bug or issue reported a bug area/broker labels Mar 16, 2023
@poorbarcode poorbarcode force-pushed the fix/infinite_get_subscriptions branch from 8cdcb15 to 8a7d584 Compare March 20, 2023 03:08
@Technoboy- Technoboy- added this to the 3.0.0 milestone Mar 20, 2023
@Technoboy- Technoboy- closed this Mar 20, 2023
@Technoboy- Technoboy- reopened this Mar 20, 2023
@wangjialing218
Copy link
Contributor

This fix could solve #18149

@poorbarcode
Copy link
Contributor Author

Hi @wangjialing218

This fix could solve #18149

Yes, I've seen the issue, and I'm sure it's the same one

@poorbarcode
Copy link
Contributor Author

/pulsarbot rerun-failure-checks

@codelipenghui codelipenghui added the release/important-notice The changes which are important should be mentioned in the release note label Mar 28, 2023
@Technoboy- Technoboy- changed the title [fix] [broker] fix infinite HTTP call getSubscriptions caused by wrong topicName [fix] [broker] Fix infinite HTTP call getSubscriptions caused by wrong topicName Apr 6, 2023
Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As most cases suffer loss from the DLQ topic name, can we just avoid the topic name with -partition- and -DLQ keyword run into the infinite loop? So that we will not break the current behavior.

In other words, we will not automatically create a partitioned topic for the DLQ topic even if users are enabled partitioned topic auto-creation. If users really need a partitioned topic for DLQ, they should create it themselves. IMO, it should be a misuse of DLQ, users should not set a high throughput expected to a DLQ. That doesn't make sense for most real cases.

@@ -231,7 +231,7 @@ public String getEncodedLocalName() {
}

public TopicName getPartition(int index) {
if (index == -1 || this.toString().contains(PARTITIONED_TOPIC_SUFFIX)) {
if (index == -1 || this.toString().endsWith(PARTITIONED_TOPIC_SUFFIX + index)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I thought was the PR intent to allow the partitioned name can contain -partition- keyword to resolve the issue. But it will introduce a new unreasonable rule to the partitioned topic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will not automatically create a partitioned topic for the DLQ topic even if users are enabled partitioned topic auto-creation.

Yes, PIP-263: Just auto-create no-partitioned DLQ And Prevent auto-create a DLQ for a DLQ try to do this.

Copy link
Contributor Author

@poorbarcode poorbarcode Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I thought was the PR intent to allow the partitioned name can contain -partition- keyword to resolve the issue. But it will introduce a new unreasonable rule to the partitioned topic.

I think there are three separate things:

  • Our rule for a partition name and how to check which index it is: end with -partition-x or contain -partition-x
    • The current logic is incorrect, and the current PR can fix it.
  • Whether to allow to create a topic whose name contains -paritition-
    • the behavior is denied for Admin API
      -the behavior is allowed for client(such as create by a cmd-subscribe), you can see the tests testInfiniteHttpCallGetSubscriptions2, testInfiniteHttpCallGetSubscriptions3 in current PR
  • Whether only create non-partitioned topics for the Dead Letter or Retry Letter

@poorbarcode
Copy link
Contributor Author

I have submitted a quick fix for this issue. Please take a look. @Technoboy- @codelipenghui

@poorbarcode poorbarcode force-pushed the fix/infinite_get_subscriptions branch from e6f54b6 to 73cb806 Compare April 23, 2023 02:55
@poorbarcode poorbarcode changed the title [fix] [broker] Fix infinite HTTP call getSubscriptions caused by wrong topicName [fix] [broker] Fix wrong logic of method TopicName.getPartition(int index) Apr 23, 2023
@codecov-commenter
Copy link

Codecov Report

Merging #19841 (73cb806) into master (6036dcc) will increase coverage by 0.00%.
The diff coverage is 41.17%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #19841   +/-   ##
=========================================
  Coverage     72.93%   72.93%           
- Complexity    31934    31963   +29     
=========================================
  Files          1868     1868           
  Lines        138449   138458    +9     
  Branches      15235    15236    +1     
=========================================
+ Hits         100978   100991   +13     
  Misses        29438    29438           
+ Partials       8033     8029    -4     
Flag Coverage Δ
inttests 24.19% <0.00%> (+0.08%) ⬆️
systests 24.84% <0.00%> (-0.09%) ⬇️
unittests 72.22% <41.17%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...pulsar/broker/admin/impl/PersistentTopicsBase.java 64.30% <33.33%> (-0.10%) ⬇️
...mon/policies/data/stats/SubscriptionStatsImpl.java 69.91% <50.00%> (-0.74%) ⬇️
...ava/org/apache/pulsar/common/naming/TopicName.java 96.33% <100.00%> (ø)

... and 78 files with indirect coverage changes

@Technoboy- Technoboy- merged commit 6e585ac into apache:master Apr 23, 2023
@poorbarcode poorbarcode deleted the fix/infinite_get_subscriptions branch April 23, 2023 09:56
poorbarcode added a commit that referenced this pull request Jan 31, 2024
…ptions caused by wrong topicName (#21997)

Similar to: #20131

The master branch has fixed the issue by #19841 Since it will makes users can not receive the messages which created in mistake, we did not cherry-pick #19841 into other branches, see detail #19841)

### Motivation

#### Background of Admin API `PersistentTopics.createSubscription`
It works like this:
1. createSubscription( `tp1` )
2. is partitioned topic? 
  `no`: return subscriptions
  `yes`: createSubscription(`tp1-partition-0`)....createSubscription(`tp1-partition-n`)

--- 

#### Background of the issue of `TopicName.getPartition(int index)`
```java
String partitionedTopic = "tp1-partition-0-DLQ";

TopicName partition0 = partitionedTopic.getPartition(0);// Highlight: the partition0.toString() will be "tp1-partition-0-DLQ"(it is wrong).The correct value is "tp1-partition-0-DLQ-partition-0"
```


#### Issue
Therefore, if there has a partitioned topic named `tp1-partition-0-DLQ`, the method `PersistentTopics.createSubscription` will works like this:
1. call Admin API ``PersistentTopics.createSubscription("tp1-partition-0-DLQ")`
2. is partitioned topic? 
3. yes, call `TopicName.getPartition(0)` to get partition 0 and will get `tp1-partition-0-DLQ` , then loop to step-1.

Then the infinite HTTP call `PersistentTopics.createSubscription` makes the broker crash.

### Modifications

#### Quick fix(this PR does it)
If hits the issue which makes the topic name wrong, do not loop to step 1.

#### Long-term fix
The PR #19841 fixes the issue which makes the topic name wrong, and this PR will create unfriendly compatibility, and PIP 263 #20033 will make compatibility good.
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 15, 2024
…ptions caused by wrong topicName (apache#21997)

Similar to: apache#20131

The master branch has fixed the issue by apache#19841 Since it will makes users can not receive the messages which created in mistake, we did not cherry-pick apache#19841 into other branches, see detail apache#19841)

It works like this:
1. createSubscription( `tp1` )
2. is partitioned topic?
  `no`: return subscriptions
  `yes`: createSubscription(`tp1-partition-0`)....createSubscription(`tp1-partition-n`)

---

```java
String partitionedTopic = "tp1-partition-0-DLQ";

TopicName partition0 = partitionedTopic.getPartition(0);// Highlight: the partition0.toString() will be "tp1-partition-0-DLQ"(it is wrong).The correct value is "tp1-partition-0-DLQ-partition-0"
```

Therefore, if there has a partitioned topic named `tp1-partition-0-DLQ`, the method `PersistentTopics.createSubscription` will works like this:
1. call Admin API ``PersistentTopics.createSubscription("tp1-partition-0-DLQ")`
2. is partitioned topic?
3. yes, call `TopicName.getPartition(0)` to get partition 0 and will get `tp1-partition-0-DLQ` , then loop to step-1.

Then the infinite HTTP call `PersistentTopics.createSubscription` makes the broker crash.

If hits the issue which makes the topic name wrong, do not loop to step 1.

The PR apache#19841 fixes the issue which makes the topic name wrong, and this PR will create unfriendly compatibility, and PIP 263 apache#20033 will make compatibility good.

(cherry picked from commit 4386401)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 17, 2024
…ptions caused by wrong topicName (apache#21997)

Similar to: apache#20131

The master branch has fixed the issue by apache#19841 Since it will makes users can not receive the messages which created in mistake, we did not cherry-pick apache#19841 into other branches, see detail apache#19841)

It works like this:
1. createSubscription( `tp1` )
2. is partitioned topic?
  `no`: return subscriptions
  `yes`: createSubscription(`tp1-partition-0`)....createSubscription(`tp1-partition-n`)

---

```java
String partitionedTopic = "tp1-partition-0-DLQ";

TopicName partition0 = partitionedTopic.getPartition(0);// Highlight: the partition0.toString() will be "tp1-partition-0-DLQ"(it is wrong).The correct value is "tp1-partition-0-DLQ-partition-0"
```

Therefore, if there has a partitioned topic named `tp1-partition-0-DLQ`, the method `PersistentTopics.createSubscription` will works like this:
1. call Admin API ``PersistentTopics.createSubscription("tp1-partition-0-DLQ")`
2. is partitioned topic?
3. yes, call `TopicName.getPartition(0)` to get partition 0 and will get `tp1-partition-0-DLQ` , then loop to step-1.

Then the infinite HTTP call `PersistentTopics.createSubscription` makes the broker crash.

If hits the issue which makes the topic name wrong, do not loop to step 1.

The PR apache#19841 fixes the issue which makes the topic name wrong, and this PR will create unfriendly compatibility, and PIP 263 apache#20033 will make compatibility good.

(cherry picked from commit 4386401)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 17, 2024
…ptions caused by wrong topicName (apache#21997)

Similar to: apache#20131

The master branch has fixed the issue by apache#19841 Since it will makes users can not receive the messages which created in mistake, we did not cherry-pick apache#19841 into other branches, see detail apache#19841)

It works like this:
1. createSubscription( `tp1` )
2. is partitioned topic?
  `no`: return subscriptions
  `yes`: createSubscription(`tp1-partition-0`)....createSubscription(`tp1-partition-n`)

---

```java
String partitionedTopic = "tp1-partition-0-DLQ";

TopicName partition0 = partitionedTopic.getPartition(0);// Highlight: the partition0.toString() will be "tp1-partition-0-DLQ"(it is wrong).The correct value is "tp1-partition-0-DLQ-partition-0"
```

Therefore, if there has a partitioned topic named `tp1-partition-0-DLQ`, the method `PersistentTopics.createSubscription` will works like this:
1. call Admin API ``PersistentTopics.createSubscription("tp1-partition-0-DLQ")`
2. is partitioned topic?
3. yes, call `TopicName.getPartition(0)` to get partition 0 and will get `tp1-partition-0-DLQ` , then loop to step-1.

Then the infinite HTTP call `PersistentTopics.createSubscription` makes the broker crash.

If hits the issue which makes the topic name wrong, do not loop to step 1.

The PR apache#19841 fixes the issue which makes the topic name wrong, and this PR will create unfriendly compatibility, and PIP 263 apache#20033 will make compatibility good.

(cherry picked from commit 4386401)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 19, 2024
…ptions caused by wrong topicName (apache#21997)

Similar to: apache#20131

The master branch has fixed the issue by apache#19841 Since it will makes users can not receive the messages which created in mistake, we did not cherry-pick apache#19841 into other branches, see detail apache#19841)

It works like this:
1. createSubscription( `tp1` )
2. is partitioned topic?
  `no`: return subscriptions
  `yes`: createSubscription(`tp1-partition-0`)....createSubscription(`tp1-partition-n`)

---

```java
String partitionedTopic = "tp1-partition-0-DLQ";

TopicName partition0 = partitionedTopic.getPartition(0);// Highlight: the partition0.toString() will be "tp1-partition-0-DLQ"(it is wrong).The correct value is "tp1-partition-0-DLQ-partition-0"
```

Therefore, if there has a partitioned topic named `tp1-partition-0-DLQ`, the method `PersistentTopics.createSubscription` will works like this:
1. call Admin API ``PersistentTopics.createSubscription("tp1-partition-0-DLQ")`
2. is partitioned topic?
3. yes, call `TopicName.getPartition(0)` to get partition 0 and will get `tp1-partition-0-DLQ` , then loop to step-1.

Then the infinite HTTP call `PersistentTopics.createSubscription` makes the broker crash.

If hits the issue which makes the topic name wrong, do not loop to step 1.

The PR apache#19841 fixes the issue which makes the topic name wrong, and this PR will create unfriendly compatibility, and PIP 263 apache#20033 will make compatibility good.

(cherry picked from commit 4386401)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 23, 2024
…ptions caused by wrong topicName (apache#21997)

Similar to: apache#20131

The master branch has fixed the issue by apache#19841 Since it will makes users can not receive the messages which created in mistake, we did not cherry-pick apache#19841 into other branches, see detail apache#19841)

It works like this:
1. createSubscription( `tp1` )
2. is partitioned topic?
  `no`: return subscriptions
  `yes`: createSubscription(`tp1-partition-0`)....createSubscription(`tp1-partition-n`)

---

```java
String partitionedTopic = "tp1-partition-0-DLQ";

TopicName partition0 = partitionedTopic.getPartition(0);// Highlight: the partition0.toString() will be "tp1-partition-0-DLQ"(it is wrong).The correct value is "tp1-partition-0-DLQ-partition-0"
```

Therefore, if there has a partitioned topic named `tp1-partition-0-DLQ`, the method `PersistentTopics.createSubscription` will works like this:
1. call Admin API ``PersistentTopics.createSubscription("tp1-partition-0-DLQ")`
2. is partitioned topic?
3. yes, call `TopicName.getPartition(0)` to get partition 0 and will get `tp1-partition-0-DLQ` , then loop to step-1.

Then the infinite HTTP call `PersistentTopics.createSubscription` makes the broker crash.

If hits the issue which makes the topic name wrong, do not loop to step 1.

The PR apache#19841 fixes the issue which makes the topic name wrong, and this PR will create unfriendly compatibility, and PIP 263 apache#20033 will make compatibility good.

(cherry picked from commit 4386401)
@lhotari
Copy link
Member

lhotari commented Jul 29, 2024

@poorbarcode The test PartitionKeywordCompatibilityTest is broken in branch-3.0 . Cherry-picking this PR fixes the issue. Please let me know if you have concerns about cherry-picking this PR to branch-3.0.

@poorbarcode
Copy link
Contributor Author

@poorbarcode The test PartitionKeywordCompatibilityTest is broken in branch-3.0 . Cherry-picking this PR fixes the issue. Please let me know if you have concerns about cherry-picking this PR to branch-3.0.

https://github.com/apache/pulsar/pull/19841/files#diff-93feae220f18ea80cb01ec7f2cefeab410aa0f3f181eed1206da4a294db0f701R233-R234 This changed the behavior of getPartition(index) , which will choose different topic than before, it may cause the data in the original topic can not be consumed

@lhotari
Copy link
Member

lhotari commented Jul 29, 2024

This changed the behavior of getPartition(index) , which will choose different topic than before, it may cause the data in the original topic can not be consumed

@poorbarcode In which case would this apply?

@poorbarcode
Copy link
Contributor Author

@lhotari

This changed the behavior of getPartition(index) , which will choose different topic than before, it may cause the data in the original topic can not be consumed

@poorbarcode In which case would this apply?

See the section 4 in the Motivation

Screenshot 2024-07-29 at 18 04 12

@lhotari
Copy link
Member

lhotari commented Jul 29, 2024

@lhotari

This changed the behavior of getPartition(index) , which will choose different topic than before, it may cause the data in the original topic can not be consumed

@poorbarcode In which case would this apply?

See the section 4 in the Motivation

@poorbarcode Since #22705 has been cherry-picked to branch-3.0, does it prevent the problem from occurring?

lhotari pushed a commit that referenced this pull request Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker cherry-picked/branch-3.0 doc-not-needed Your PR changes do not impact docs ready-to-test release/important-notice The changes which are important should be mentioned in the release note release/3.0.6 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Partitioned DLQ topic with topic name contains -partition- may cause broker crash
6 participants