Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenJDK java/util/concurrent/SynchronousQueue/Fairness - Error: fair=true i=6 j=7 #18771

Open
pshipton opened this issue Jan 18, 2024 · 19 comments

Comments

@pshipton
Copy link
Member

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_s390x_linux_Release_testList_0/10/
jdk_util_1
java/util/concurrent/SynchronousQueue/Fairness.java

19:29:58  java.lang.Error: fair=true i=6 j=7
19:29:58  
19:29:58  	at Fairness.testFairness(Fairness.java:64)
19:29:58  	at Fairness.main(Fairness.java:73)
19:29:58  	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
19:29:58  	at java.base/java.lang.reflect.Method.invoke(Method.java:586)
19:29:58  	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333)
19:29:58  	at java.base/java.lang.Thread.run(Thread.java:1595)
@pshipton
Copy link
Member Author

pshipton commented Jan 18, 2024

@pshipton
Copy link
Member Author

pshipton commented Jan 18, 2024

The grinder isn't finished, but this failure has already been repeated a number of times.
https://openj9-jenkins.osuosl.org/job/Grinder_iteration_0/413
https://openj9-jenkins.osuosl.org/job/Grinder_iteration_2/371
https://openj9-jenkins.osuosl.org/job/Grinder_iteration_0/414/

The failure is repeatable when running just the single test.

@tobi someone needs to look at this.

@keithc-ca
Copy link
Contributor

keithc-ca commented Jan 18, 2024

I think the test is broken: There's nothing in the code that prioritizes threads started earlier over those started later.

It is possible that the test is in a state where all the started threads are immediately after the call to lock.unlock() (but before the call to q.put(I)) before CPU cores are available to advance any of them. When processors are available, they can be scheduled in any order, providing no guarantee that the values are queued in order of ascending value.

By artificially inserting delays between those two calls (longer for smaller-numbered threads) the test can be made to fail at will.

@keithc-ca
Copy link
Contributor

In fact, even if all threads call Thread.sleep(0) the test fails regularly.

@pshipton
Copy link
Member Author

It's not a new test, and the test hasn't changed in years. It's not failing on other platforms, and we haven't seen it fail previously. I'm wondering what changed, and why it only fails on zlinux now.

@keithc-ca
Copy link
Contributor

There are changes to SynchronousQueue between the openj9 and openj9-staging branches. In particular, this commit makes reference to that test failing

8300663: java/util/concurrent/SynchronousQueue/Fairness.java failed with "Error: fair=true i=0 j=1"

I'll dig deeper to see if I can find an explanation there.

@keithc-ca
Copy link
Contributor

In https://bugs.java.com/bugdatabase/view_bug?bug_id=8300663 they seem to agree that testing fairness is difficult (emphasis mine):

I strongly suspect that any test trying to assert for fairness in this case will have to establish a total order, and as such total order is not likely to exist in any real-world use-case my bet is that fairness in this case (synchronous queue) is not observable.

@pshipton
Copy link
Member Author

If we're going to ignore the test failure, the test needs to be excluded. I'd start with just on zlinux unless we start to see it fail elsewhere.

@keithc-ca
Copy link
Contributor

The code for SynchronousQueue is the same in jdk21, jdk22 and jdknext. I just reproduced the problem with jdknext on xLinux. I suggest it should be excluded on all platforms.

@pshipton
Copy link
Member Author

ok

@keithc-ca
Copy link
Contributor

See adoptium/aqa-tests#4987.

@pshipton
Copy link
Member Author

pshipton commented Jan 19, 2024

For the record, it's not failing on the zlinux 21.0.2 Temurin build.
https://openj9-jenkins.osuosl.org/job/Grinder/3184/ (10x on 5 machines)

14:47:33  openjdk version "21.0.2-beta" 2024-01-16
14:47:33  OpenJDK Runtime Environment Temurin-21.0.2+13-202401172331 (build 21.0.2-beta+13-ea)
14:47:33  OpenJDK 64-Bit Server VM Temurin-21.0.2+13-202401172331 (build 21.0.2-beta+13-ea, mixed mode, sharing)

@pshipton pshipton added perm excluded The test is unsuitable and permanently excluded. and removed blocker labels Jan 19, 2024
@pshipton
Copy link
Member Author

Pls create a PR for https://github.com/adoptium/aqa-tests/tree/v1.0.0-release as well.

@keithc-ca
Copy link
Contributor

JasonFengJ9 pushed a commit to adoptium/aqa-tests that referenced this issue Jan 19, 2024
@keithc-ca
Copy link
Contributor

The test waits for the newly launched thread to leave the RUNNING state; normally that happens by parking the thread:

3XMTHREADINFO      "supplier#6" J9VMThread:0x0000000000228400, omrthread_t:0x00007F5D1C7AF070, java/lang/Thread:0x00000000FFEB90A0, state:P, prio=5
3XMJAVALTHREAD            (java/lang/Thread getId:0x1C, isDaemon:false)
3XMJAVALTHRCCL            jdk/internal/loader/ClassLoaders$AppClassLoader(0x00000000258F20C8)
3XMTHREADINFO1            (native thread ID:0x1B6F9, native priority:0x5, native policy:UNKNOWN, vmstate:P, vm thread flags:0x00020081)
3XMTHREADINFO2            (native stack address range from:0x00007F5CFC003000, to:0x00007F5CFC083000, size:0x80000)
3XMCPUTIME               CPU usage total: 0.002249038 secs, current category="Application"
3XMTHREADBLOCK     Parked on: java/util/concurrent/SynchronousQueue$Transferer@0x00000000258FB378 Owned by: <unknown>
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=4096 (0x1000)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at jdk/internal/misc/Unsafe.park(Native Method)
4XESTACKTRACE                at java/util/concurrent/locks/LockSupport.park(LockSupport.java:371)
4XESTACKTRACE                at java/util/concurrent/LinkedTransferQueue$DualNode.await(LinkedTransferQueue.java:458)
4XESTACKTRACE                at java/util/concurrent/LinkedTransferQueue.xfer(LinkedTransferQueue.java:613)
4XESTACKTRACE                at java/util/concurrent/SynchronousQueue.xfer(SynchronousQueue.java:233)
4XESTACKTRACE                at java/util/concurrent/SynchronousQueue.put(SynchronousQueue.java:264)
4XESTACKTRACE                at Fairness.unlockAndPut(Fairness.java:73)
4XESTACKTRACE                at Fairness.access$1(Fairness.java:70)
4XESTACKTRACE                at Fairness$1.run(Fairness.java:122)

Sometimes, however, there's contention for Thread.interruptLock between getState() and interrupted(). When the new thread loses, it becomes BLOCKED:

3XMTHREADINFO      "supplier#8" J9VMThread:0x000000000022FE00, omrthread_t:0x00007F5D1C7B8810, java/lang/Thread:0x00000000FFEC48A0, state:R, prio=5
3XMJAVALTHREAD            (java/lang/Thread getId:0x1E, isDaemon:false)
3XMJAVALTHRCCL            jdk/internal/loader/ClassLoaders$AppClassLoader(0x00000000258F20C8)
3XMTHREADINFO1            (native thread ID:0x1B6FB, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000081)
3XMTHREADINFO2            (native stack address range from:0x00007F5CC42FF000, to:0x00007F5CC437F000, size:0x80000)
3XMCPUTIME               CPU usage total: 0.000705496 secs, current category="Application"
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/lang/Thread.getAndClearInterrupt(Thread.java:1801)
5XESTACKTRACE                   (entered lock: java/lang/Object@0x00000000FFEC4928, entry count: 1)
4XESTACKTRACE                at java/lang/Thread.interrupted(Thread.java:1755)
4XESTACKTRACE                at java/util/concurrent/SynchronousQueue.put(SynchronousQueue.java:263)
4XESTACKTRACE                at Fairness.unlockAndPut(Fairness.java:73)
4XESTACKTRACE                at Fairness.access$1(Fairness.java:70)
4XESTACKTRACE                at Fairness$1.run(Fairness.java:122)

The call to Thread.interrupted() is a recent addition to SynchronousQueue.put() which explains why the test started failing.

Thread is modified for OpenJ9: interrupted() has synchronization (in getAndClearInterrupt()) that is not present upstream. It's not immediately obvious that that synchronization is necessary; perhaps it can be removed which would improve performance and allow the test to pass.

@pshipton
Copy link
Member Author

pshipton commented Feb 29, 2024

Are any changes for this expected in the next couple of weeks, or should this be moved forward to the next release?

@keithc-ca
Copy link
Contributor

The changes to Thread.getAndClearInterrupt() that were made in ibmruntimes/openj9-openjdk-jdk#746 may have resolved this for jdk23+. I think we should back-port that change to jdk22 and jdk21 which should fix this for those versions as well.

@keithc-ca
Copy link
Contributor

As we discovered in testing the related pull requests, removing the synchronization that breaks this test reintroduces a more serious, functional problem. I think the synchronization should be restored in the head stream until a solution that doesn't need it is discovered.

@keithc-ca
Copy link
Contributor

See ibmruntimes/openj9-openjdk-jdk#748 which restores that synchronization;

I think we should move this forward to the next release. I don't think it could cause any defensible real-world problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants