-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][metadata] Cleanup state when lock revalidation gets LockBusyException
#17700
Conversation
...ar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java
Show resolved
Hide resolved
LockBusyException
LockBusyException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this right, the root cause this case is that we missed handling revalidate failure in revalidateIfNeededAfterReconnection
. Can we just add it there, like lockWasInvalidated
did?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have suggestions, but I think that we must cover this kind of changes with tests
@michaeljmarshall @lhotari you were working on this part last week PTAL
I don't want to invoke exception handling everywhere. |
...ar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java
Outdated
Show resolved
Hide resolved
Add testing, convert to draft. |
...ar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java
Outdated
Show resolved
Hide resolved
...ar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java
Outdated
Show resolved
Hide resolved
...ar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java
Outdated
Show resolved
Hide resolved
|
||
ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join(); | ||
AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>(); | ||
// lock 2 will steal the distributed lock first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we need the steal lock
operation.
But this is a potential risk. It will introduce too many small ledgers if we encounter this problem
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java
Outdated
Show resolved
Hide resolved
…ination/impl/ResourceLockImpl.java Co-authored-by: Penghui Li <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ception` (apache#17700) ### Motivation In the production environment, we found two brokers holding the same valid locks. and one has an exceptional revalidate future with `lockBusyException`. after reading the code, there may forget the reset the cache and complete expire exception when getting lockBusyException. (cherry picked from commit 955ae34)
…ception` (#17700) ### Motivation In the production environment, we found two brokers holding the same valid locks. and one has an exceptional revalidate future with `lockBusyException`. after reading the code, there may forget the reset the cache and complete expire exception when getting lockBusyException. (cherry picked from commit 955ae34)
…ception` (#17700) ### Motivation In the production environment, we found two brokers holding the same valid locks. and one has an exceptional revalidate future with `lockBusyException`. after reading the code, there may forget the reset the cache and complete expire exception when getting lockBusyException. (cherry picked from commit 955ae34)
…ception` (#17700) ### Motivation In the production environment, we found two brokers holding the same valid locks. and one has an exceptional revalidate future with `lockBusyException`. after reading the code, there may forget the reset the cache and complete expire exception when getting lockBusyException. (cherry picked from commit 955ae34)
…est !188) Squash merge branch 'release_2.8.1.4_fix_metadata' into 'release-2.8.1.4' Fixes #<xyz> ### Motivation chery pick了三个PR: PIP-45: Handle session events and invalidations from single thread (apache#12184) [fix][metadata] Set revalidateAfterReconnection true for certain failures apache#17664 [fix][metadata] Cleanup state when lock revalidation gets LockBusyException apache#17700 TAPD: --story=881235137
Motivation
In the production environment, we found two brokers holding the same valid locks. and one has an exceptional revalidate future with
lockBusyException
. after reading the code, there may forget the reset the cache and complete expire exception when getting lockBusyException.Snapshot: broker A
Snapshot: broker B
Modifications
LockBusyException
retryWhenConnectionLost
parameter to avoid revalidating unused lock when acquiring fail.Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc-required
(Your PR needs to update docs and you will update later)
doc-not-needed
(Please explain why)
doc
(Your PR contains doc changes)
doc-complete
(Docs have been already added)