Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][broker] Unblock stuck Key_Shared subscription after consumer reconnect #21576

Conversation

nborisov
Copy link

Fixes #21199 and #15705

Motivation

Current delivery mechanism blocks recently joined consumers from getting messages until all the previously delivered but not acknowledged messages are either acknowledged by the consumer which received the messages or the consumer disconnected

This is implementation trade off which allows not to track all the sent messages. This approach allows to reduce memory consumption and increase performance

The same time such implementation leads to a situation when a single failed message could lead to all the consumers blockage (described at #21199)

There is a setting allowOutOfOrderDelivery which allows to mitigate the described issue but leads to ordering guarantees loss

For some scenarios both consumption stuck and out of order delivery is not an option. The same time it could be ok to have more memory consumption and some minor performance degradation

Modifications

  1. Sent but not acknowledged messages tracking added
  2. Messages tracking setting added

Tracking of delivered but not acknowledged messages allows not to send them to recently joined consumers and the same time preserve ordering

Making the new functionality switchable allows to preserve the current optimized memory consumption and performance for all the users with default settings. The same time users who is ready to memory and performance trade off getting opportunity to solve both consumers stuck problem and out of order delivery

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • Existing KeySharedSubscriptionTest extended with scenarios with new setting enabled
  • Added integration test for verification of preserving ordering with the new setting enabled
  • Added integration test for verification that recently joined consumers are not blocked when there are some unacknowledged messages delivered to existing consumers

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

Nikolai Borisov and others added 30 commits November 13, 2023 13:57
… if the previous one is inactive (apache#21220)

### Motivation

If a producer establishes a new connection when it is reconnecting, while the previous connection is now inactive, the initial request made on the new connection will fail.  This failure will trigger the topic of cleaning up the inactive producers.  However, upon making a second request, the producer will be able to successfully establish a connection and proceed with the operation.


### Modifications

Make the initial request made on the new connection success.
… to skip lookups upon producer reconnections during unloading
…truction (apache#21270)

Reopen apache#21170
### Motivation
If a topic persistently experiences a substantial quantity of data inputs,  the act of reading all the messages present in this topic to build a TableView can take an excessive amount of time.
### Modification
In the process of constructing the TableView, initially, the last message ID of the current topic is procured. Consequently, once this last message ID has been reached, the creation ensues to its completion.
…on. (apache#21268)

### Motivation

Now, only the last chunk will be acknowledged when acknowledging chunk messages with transactions.
If the messageId is a `ChunkMessageIdImpl`, the ledger ID and entry ID will belong to the `lastChunkMsgId`.
https://github.com/apache/pulsar/blob/2b5c199053a5b2d7f849e6604d619bae9197a8c9/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java#L2791-L2814

https://github.com/apache/pulsar/blob/2b5c199053a5b2d7f849e6604d619bae9197a8c9/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ChunkMessageIdImpl.java#L30-L33
### Modifications

Flow the common message acknowledge logic, ack all the chunks when acknowledging messages with transactions.
…21445)

### Motivation

PR apache#21231 made user topic creation rely on system topic `__change_event` if the user is enabling `topicLevelPoliciesEnabled`.

It will introduce a race condition with namespace bundle unloading. All creating topics want to create `__change_event` reader but constantly fail by namespace-bundle inactive and retry mechanism. Unfortunately, the namespace-bundle unloading operation is waiting for all the topics to be completed and then release inactive status. Therefore, they will be stuck in a deadlock until one gets a timeout.


### Modifications

- Get the topic policy before loading.
mattisonchao and others added 9 commits November 13, 2023 16:46
…pache#21542)

### Motivation

When an ML tries to create a new ledger, it will create a delay task to check if the ledger create request is timeout<sup>[1]</sup>.

However, we should cancel this delay task after the request to create new ledgers is finished. Otherwise, these tasks will cost unnecessary CPU resources.

### Modifications

Cancel the scheduled task after the create ledger request is finished
@github-actions github-actions bot added type/PIP doc-required Your PR changes impact docs and you will update later. labels Nov 14, 2023
@nborisov
Copy link
Author

Sorry: wrong place to create the PR :( will create a new one in the correct place

@nborisov nborisov closed this Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-required Your PR changes impact docs and you will update later. type/PIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Subscription consumption stuck on consumer reconnect