-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cycle detection in a multi-thread environment leads to OutOfMemoryError #1510
Comments
we are also affected by this issue |
any update here? |
Using the reproducer from the description, bisected this to: This is the first commit I see the OOM with, I imagine something in the changes in
|
Hi @sameb , I see you are still active and were part of the original PR that caused this OOM: #915 Could you take a look here? Is there maybe some oversight, that causes an endless loop in Best regards and thanks, |
This change prevents an endless loop in: ReentrantCycleDetectingLock.addAllLockIdsAfter() For reasons that are not yet clear, according to CycleDetectingLockFactory.locksOwnedByThread and ReentrantCycleDetectingLock.lockOwnerThread, a thread both owns a lock and waits on that same lock. This leads to an endless loop in the cycle detection. The change adds a workaround, forcing the cycle detection to exit if the above condition is met.
…detectPotentialLocksCycle() Due to how code in ReentrantCycleDetectingLock.lockOrDetectPotentialLocksCycle() is synchronized, its possible for a thread to both own/hold a lock (according to ReentrantCycleDetectingLock.lockOwnerThread) and wait on the same lock (according to CycleDetectingLock.lockThreadIsWaitingOn). In this state, if another thread tries to hold the same lock an endless loop will occur when calling detectPotentialLocksCycle(). The change adds a workaround, forcing the cycle detection to exit if the above condition is met. Workaround for: google#1510
…detectPotentialLocksCycle() Due to how code in ReentrantCycleDetectingLock.lockOrDetectPotentialLocksCycle() is synchronized, its possible for a thread to both own/hold a lock (according to ReentrantCycleDetectingLock.lockOwnerThread) and wait on the same lock (according to CycleDetectingLock.lockThreadIsWaitingOn). In this state, if another thread tries to hold the same lock an endless loop will occur when calling detectPotentialLocksCycle(). With this change detectPotentialLocksCycle() removes the lock owning thread from ReentrantCycleDetectingLock.lockOwnerThread, before chekcing for lock cycles. This prevents the endless loop during cycle detection. Fix for: google#1510
…detectPotentialLocksCycle() Due to how code in ReentrantCycleDetectingLock.lockOrDetectPotentialLocksCycle() is synchronized, its possible for a thread to both own/hold a lock (according to ReentrantCycleDetectingLock.lockOwnerThread) and wait on the same lock (according to CycleDetectingLock.lockThreadIsWaitingOn). In this state, if another thread tries to hold the same lock an endless loop will occur when calling detectPotentialLocksCycle(). With this change detectPotentialLocksCycle() removes the lock owning thread from ReentrantCycleDetectingLock.lockOwnerThread, if it detects that "this" lock is both waited on and owned by the same thread. This prevents the endless loop during cycle detection. Fix for: google#1510
…would previously fail: * T1: Enter `lockOrDetectPotentialLocksCycle`: - Lock CAF.class, add itself to `lockThreadIsWaitingOn`, unlock CAF.class - Lock the cycle detecting lock (CDL) - Lock CAF.class, mark itself as `lockOwnerThread`, remove itself from `lockThreadIsWaitingOn` - Exit `lockOrDetectPotentialLocksCycle` * T1: Re-enter `lockOrDetectPotentialLocksCycle`: - Lock CAF.class, add itself to `lockThreadIsWaitingOn`, unlock CAF.class T2: Enter `lockOrDetectPotentialLocksCycle` - Lock CAF.class, invoke `detectPotentialLocksCycle`. At this point, `detectPotentialLocksCycle` will now loop forever, because the `lockOwnerThread` is also in `lockThreadIsWaitingOn`. During the course of looping forever, it will OOM, because it's building up an in-memory structure of what it thinks are cycles. The solution is to avoid the re-entrant T1 from adding itself to `lockThreadIsWaitingOn` if it's already the `lockOwnerThread`. It's guaranteed that it won't relinquish the lock concurrently, because it's the exact same thread that owns it. Fixes #1635 and Fixes #1510 PiperOrigin-RevId: 524376697
…would previously fail: * T1: Enter `lockOrDetectPotentialLocksCycle`: - Lock CAF.class, add itself to `lockThreadIsWaitingOn`, unlock CAF.class - Lock the cycle detecting lock (CDL) - Lock CAF.class, mark itself as `lockOwnerThread`, remove itself from `lockThreadIsWaitingOn` - Exit `lockOrDetectPotentialLocksCycle` * T1: Re-enter `lockOrDetectPotentialLocksCycle`: - Lock CAF.class, add itself to `lockThreadIsWaitingOn`, unlock CAF.class T2: Enter `lockOrDetectPotentialLocksCycle` - Lock CAF.class, invoke `detectPotentialLocksCycle`. At this point, `detectPotentialLocksCycle` will now loop forever, because the `lockOwnerThread` is also in `lockThreadIsWaitingOn`. During the course of looping forever, it will OOM, because it's building up an in-memory structure of what it thinks are cycles. The solution is to avoid the re-entrant T1 from adding itself to `lockThreadIsWaitingOn` if it's already the `lockOwnerThread`. It's guaranteed that it won't relinquish the lock concurrently, because it's the exact same thread that owns it. Fixes #1635 and Fixes #1510 PiperOrigin-RevId: 524376697
Thanks for fixing this! |
You're welcome. Sorry this was left unfixed for so long. I'm spending this & next week cleaning up Guice's PR and issue backlog, as a tribute to @crazybob. |
I did not know, this is very saddening :( |
In some scenario the cycle detection leads the application to an OutOfMemoryError.
Specifically, this seems to happen if:
The following test reproduces the issue (not always but often)
a sample output, when the issue occurs, is
I'm running version 4.2.3
Please note that, unless I'm doing something wrong, this makes Guice default settings harmful if you are running it in a multi-thread application.
As a matter of fact, the only way to fix the code above is to remove the cyclic dependency, and with the circular proxy feature enabled (and it is by default) it becomes harder to spot it. Nobody (running a multi-thread application) reasonably wants to take the risk of getting an OutOfMemory Error - which is typically fatal for the application - based on race conditions - which can be hard to find while testing.
Most likely, as long as the bug is there, whoever runs a multi-thread application may want to disable the feature completely, for example calling
Binder::disableCircularProxies()
, to override the default behavior.The text was updated successfully, but these errors were encountered: