-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Fix unexpected subscription deletion caused by the cursor last active time not updated in time #17573
[fix][broker] Fix unexpected subscription deletion caused by the cursor last active time not updated in time #17573
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this issue does not relate to the broker restart, right?
It is that the subscriptionExpirationTimeMinutes should be the last ack time of the consumers in the subscription.
Instead of the last time that the consumer was modified (consumer add or remove) in the subscription.
.../src/main/java/org/apache/pulsar/broker/service/nonpersistent/NonPersistentSubscription.java
Outdated
Show resolved
Hide resolved
If the broker shutdown gracefully, everything is fine. Because it will go through the As for the check of pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java Lines 2351 to 2358 in 200f433
If the consumer is connected, it will skip the check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are brokers shutdown with OOM or other situation, which make the broker not shutdown gracefully, the new broker load the topics and then do subscription expiry check soon before consumer reconnected, then the subscription may be deleted unexpectedly.
If the topic is loaded by another broker, the lastActive
will be reset to clock.millis()
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Line 305 in a6fe5bb
this.lastActive = this.clock.millis(); |
How the above situation can happen?
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java Line 414 in a6fe5bb
|
I see. Actually, only ledger rollover and updating the cursor Ledger failed will cause the |
If we can persistent the |
I will take a look at this and try to persistent the |
After discussed with @codelipenghui , this PR mainly fix the bug that subscription may be deleted unexpectedly caused by
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java Lines 2500 to 2514 in d7c09be
However, the persistence is triggered only in 4 places: All of them are not high frequency triggered. The new broker may recover the cursor with The better solution is to save the |
@dragonls I have approved the PR. Please also update your last comment to the PR description. So that we don't to go through all the comment to understand what is the scope of this PR and what is not the scope of this PR |
…or last active time not updated in time (apache#17573) (cherry picked from commit 8a9d70a) Signed-off-by: Zixuan Liu <[email protected]>
Fixes #17572
Motivation
The
lastActive
inManagedCursorImpl
only be updated in 3 places:org.apache.pulsar.broker.service.persistent.PersistentSubscription#addConsumer
org.apache.pulsar.broker.service.persistent.PersistentSubscription#removeConsumer
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl#internalResetCursor
If there are brokers shutdown with OOM or other situation, which make the broker not shutdown gracefully, the new broker load the topics and then do subscription expiry check soon before consumer reconnected, then the subscription may be deleted unexpectedly.
Need to update
lastActive
inManagedCursorImpl
while consuming stably, such as consumer acknowledged messages, which makeslastActive
closer to the last active meaning.Note
This PR mainly fix the bug that subscription may be deleted unexpectedly caused by
lastActive
not being updated in time, which depends on thelastActive
persistence in zk.lastActive
is saved in zk inorg.apache.bookkeeper.mledger.impl.ManagedCursorImpl#persistPositionMetaStore
:pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Lines 2500 to 2514 in d7c09be
However, the persistence is triggered only in 4 places:
All of them are not high frequency triggered. The new broker may recover the cursor with
lastActive
from zk which is not the real last active time. The unexpectedly subscription deletion still can appear during two persistence having broker shutdown not gracefully.The better solution is to save the
lastActive
into cursor ledger instead of only save to zk, but this is not the problem this PR solving, and need further discussion.Modifications
cursor.updateLastActive()
inorg.apache.pulsar.broker.service.persistent.PersistentSubscription#acknowledgeMessage
.Verifying this change
This change added tests and can be verified as follows:
org.apache.pulsar.broker.service.persistent.PersistentTopicTest#testUpdateCursorLastActive
org.apache.pulsar.broker.service.persistent.PersistentSubscriptionTest#testAcknowledgeUpdateCursorLastActive
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc-required
(Your PR needs to update docs and you will update later)
doc-not-needed
(Please explain why)
doc
(Your PR contains doc changes)
doc-complete
(Docs have been already added)