-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance for Cleaner implementation #1617
base: master
Are you sure you want to change the base?
Improve performance for Cleaner implementation #1617
Conversation
The Cleaner used multiple monitors to protect its datastructures. And as datastructre a (manually) linked list was used. The datastructure was updated to a ConcurrentHashMap and the multiple monitor usages are replaced with a ReentrandReadWriteLocks. Performance numbers: Commandline: java -jar target/benchmarks.jar -t 1000 -i 1 -wi 0 ========== 5.14.0 ========== Result "eu.doppelhelix.jna.jmh.MyBenchmark.testMethod": 1211666,184 ±(99.9%) 134595,856 ops/s [Average] (min, avg, max) = (1178371,132, 1211666,184, 1271195,212), stdev = 34954,116 CI (99.9%): [1077070,328, 1346262,040] (assumes normal distribution) Estimated CPU Load: 650% ========== 5.14.0 ========== Result "eu.doppelhelix.jna.jmh.MyBenchmark.testMethod": 3260953,271 ±(99.9%) 655799,010 ops/s [Average] (min, avg, max) = (3092006,068, 3260953,271, 3527896,224), stdev = 170308,920 CI (99.9%): [2605154,261, 3916752,281] (assumes normal distribution) Estimated CPU Load: 1500% ============================ Code: package eu.doppelhelix.jna.jmh; import com.sun.jna.internal.Cleaner; import java.util.concurrent.atomic.AtomicLong; import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.infra.Blackhole; public class MyBenchmark { @benchmark public void testMethod(Blackhole blackhole) { DummyObject dummyObj = new DummyObject(); DummyObjectCleaner dummyCleaner = new DummyObjectCleaner(blackhole, dummyObj.getDummyValue()); Cleaner.getCleaner().register(dummyObj, dummyCleaner); } public static class DummyObject { private static final AtomicLong ai = new AtomicLong(); private final String dummyValue; public DummyObject() { this.dummyValue = "d " + ai.incrementAndGet(); } public String getDummyValue() { return dummyValue; } } public static class DummyObjectCleaner implements Runnable{ private final Blackhole bh; private final String data; public DummyObjectCleaner(Blackhole bh, String data) { this.bh = bh; this.data = data; } public void run() { this.bh.consume(this.data); } } }
For the interpretation of the performance numbers - the maximum CPU load archivable would be 1600% (8 core 2 threads). |
@matthiasblaesing Many thanks for your quick response. We will take it for a spin. It may take us some time, though, as we're experiencing this problem only in our production environment, and we have weekly development cycles with a week of testing in a preproduction environment. We were able to shift the load over the weekend so that we're now under the threshold at which this problem is manifesting. |
try { | ||
return add(new CleanerRef(this, obj, referenceQueue, cleanupTask)); | ||
} finally { | ||
synchronized (obj) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is potentially dangerous. Some other thread could be holding a lock on obj already.
This may be a non-issue because Cleaner is for internal use only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate, why you think we can deadlock here? For a deadlock we need two different locks, but Cleaner#register
itself only locks the monitor of obj
here. Sure, we can block at this position, but the other thread currently holding the monitor of obj
can finish the critical section, release the monitor and then we run to completion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think of two threads. One locks object A and registers object B, the other one locks B and registers A.
Like I said, this is an unlikely constellation in Cleaner's internal use.
cleanerThreadLock.readLock().lock(); | ||
try { | ||
long count = trackedObjects.incrementAndGet(); | ||
if (cleanerThread == null && count > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because register()
ensures that ref
cannot be cleaned before add()
has completed, we know that count > 0
is always true here.
cleanerThread.start(); | ||
} | ||
} finally { | ||
cleanerThreadLock.readLock().lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lock
is unlocked again immediately. That can be solved better without re-locking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are mistaken. The readLock is locked and the writeLock is unlocked. This downgrades the ReadWriteLock from write to read. This is needed, as the readLock is unlocked in the finally
clause. My assumption here is, that while a thread holds the writeLock now other thread can enter it, thus taking the readLock can be done on a fast path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed, as the readLock is unlocked in the finally clause.
What I meant is that you could set a flag instead and skip the unlock in finally
if the flag is set.
registeredCleaners.append(cleanerRef.cleanupTask.toString()); | ||
break; | ||
} finally { | ||
cleanerThreadLock.readLock().lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above. Re-locking can be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
// Guard for trackedObjects and cleanerThread. The readlock is utilized when | ||
// the trackedObjects are manipulated, the writelock protectes starting and | ||
// stopping the CleanerThread | ||
private final ReadWriteLock cleanerThreadLock = new ReentrantReadWriteLock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the readlock is not necessary at all. trackedObjects
is an AtomicLong
and doesn't need any protection.
The important step is to protect the starting and stopping of the CleanerThread. In particular, it must be ensured that the thread doesn't terminate while a new reference is added without starting a new CleanerThread..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cleanerThreadLock
protects the interaction of trackedObjects
and cleanerThread
. The critical section is the starting and stopping of the Thead if and only iff the number of tracked objects reaches zero or is above 1.
This is the sequence I want to prevent:
T1
using the cleaner callsregister
and runs beforeincrementAndGet
is invokedCleanerThread
reaches thetrackedObjects.get() == 0
check and enters theif
block, but does not execute further yet.T1
executesincrementAndGet
and receives1
, checks the value ofcleanerThread
and finds aThread
there.CleanerThread
continues, clearscleanerThread
and finishes execution
There is a subtle issue there, the break
needs to move into the inner if
in line 173 (https://github.com/java-native-access/jna/pull/1617/files/b901c180f4a1026675a2859bbf24727574e19e6a#diff-11b3b0981223743529d972f317f26dfd9488b4a9b80a34a2d6f76cc2c58dcc00R173) into the inner if.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I attempted to write some pseudocode to show how to do it with a single lock and found a race in it. The cause of the race is the double-check locking that you employ in order to achieve better parallelism.
Without the double-check a single lock is sufficient, but then you're back to a mutex block that is always executed, hence no more real parallelism.
I still wonder if a tiny mutex block would be faster than the read/write lock, in particular because the ReadWriteLock JavaDoc says just that:
Further, if the read operations are too short the overhead of the read-write lock implementation (which is inherently more complex than a mutual exclusion lock) can dominate the execution cost, particularly as many read-write lock implementations still serialize all threads through a small section of code.
FTR, I have run the overload test from my optimization branch and found that my original issue is not resolved by this change. |
return inChain; | ||
private void remove(CleanerRef ref) { | ||
map.remove(ref); | ||
cleanerThreadLock.readLock().lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the lock here is not required. Like you said, the lock protects the interaction of cleanerThread
and trackedObjects
, but there is no interaction here.
The Cleaner used multiple monitors to protect its datastructures. And as datastructre a (manually) linked list was used. The datastructure was updated to a ConcurrentHashMap and the multiple monitor usages are replaced with a ReentrandReadWriteLocks.
Performance numbers:
Commandline:
java -jar target/benchmarks.jar -t 1000 -i 1 -wi 0
========== 5.14.0 ==========
Result "eu.doppelhelix.jna.jmh.MyBenchmark.testMethod":
1211666,184 ±(99.9%) 134595,856 ops/s [Average]
(min, avg, max) = (1178371,132, 1211666,184, 1271195,212), stdev = 34954,116
CI (99.9%): [1077070,328, 1346262,040] (assumes normal distribution)
Estimated CPU Load: 650%
========== 5.14.0 ==========
Result "eu.doppelhelix.jna.jmh.MyBenchmark.testMethod":
3260953,271 ±(99.9%) 655799,010 ops/s [Average]
(min, avg, max) = (3092006,068, 3260953,271, 3527896,224), stdev = 170308,920
CI (99.9%): [2605154,261, 3916752,281] (assumes normal distribution)
Estimated CPU Load: 1500%
============================
Code:
package eu.doppelhelix.jna.jmh;
import com.sun.jna.internal.Cleaner;
import java.util.concurrent.atomic.AtomicLong;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.infra.Blackhole;
public class MyBenchmark {
}