-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove aarch64Alpine from default jdk11u pipeline config due to test hangs reliability #349
Conversation
Do we understand why? Was this a platform we intended to release in July PSU? |
Thank you for creating a pull request!Please check out the information below if you have not made a pull request here before (or if you need a reminder how things work). Code Quality and Contributing GuidelinesIf you have not done so already, please familiarise yourself with our Contributing Guidelines and Code Of Conduct, even if you have contributed before. TestsGithub actions will run a set of jobs against your PR that will lint and unit test your changes. Keep an eye out for the results from these on the latest commit you submitted. For more information, please see our testing documentation. In order to run the advanced pipeline tests (executing a set of mock pipelines), I require an admin to post |
@karianna aarch64Alpine is not for July |
@andrew-m-leonard I thought this was only affecting JDK11? Have the problems been seen on other version? |
@sxa yes, seen on jdk17u as well. I'm just looking at the current running last night's tests, and so far they are all still running.... so wondering if something may have fixed it...? |
Got a link? https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk17u/job/jdk17u-alpine-linux-aarch64-temurin/ doesn't look like it's experienced any hangs on a initial look |
|
run tests |
I'm not sure what I'm reading in there. That is in relation to pipeline 34 which appearred to fail somewhere in the GPG signing step (that job has now been deleted so I can't look into it). There was a comment which links to your issue adoptium/aqa-tests#3799 Are you sure that any delay in that pipeline wasn't just a hold up caused by the executors being held up by JDK11 jobs? All I can see from pipeline 34's subjobs (NOTE: It's a weekly pipeline so would not nceessarily be directly comparable to the others) was https://ci.adoptopenjdk.net/job/Test_openjdk17_hs_extended.openjdk_aarch64_alpine-linux_testList_1/ which you killed #3 from even though it looks like it hadn't got to the end (NOTE that that the previous runs of that job took longer than the 4h4 it was at when it was terminated) |
You maybe right being only a jdk11u, although jdk19 Smoke test has hung this morning |
To be 100% clear, that was a smoke test hang on x64, not aarch64, so not relevant to this PR and from the look of it it's not new as it's hit the timeout in all runs visible on https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk19/job/jdk19-alpine-linux-x64-temurin_SmokeTests/ |
Not sure we can totally rule that out, both Alpine |
PR TESTER RESULT ❎ Some pipelines failed or the job was aborted! ❎ |
Yes, I think it is same for jdk17, test jobs are timeout and abort. Test_openjdk17_hs_extended.openjdk_aarch64_alpine-linux_testList_0 ❌ ABORTED ❌ Test_openjdk17_hs_extended.openjdk_aarch64_alpine-linux_testList_1 ❌ ABORTED ❌ |
@sophia-guo Why do you say they timed out? The first of those was the top level job that had the two testList ones underneath it. The first testList0 was stopped by Andrew after 4h03m when it typically takes between 4-5 hours so it did not timeout, but was stopped before it got to the normal time it would take to complete:
The testlList_1 job was the same - killed earlier than the amount of time it would normally take to run.
As a result of those two being aborted, the top level one was marked as aborted too. As I said earlier, I haven't seen any evidence that JDK17/alpine/aarch64 has experienced any unusual hangs, only forced aborts. https://ci.adoptopenjdk.net/job/Test_openjdk17_hs_extended.openjdk_aarch64_alpine-linux_testList_1/1/ may have hit a timeout, but (a) there were a lot of hung Grinder processes on the machine at that time which an be seen in the log, so I wouldn't count that run as conclusive evidence, and (b) it seems to have been trying to load the AWT libraries so wasn't running in headless mode which could have caused additional problems. My recommendation remains that if we're going to change this, we should be doing it for JDK11u/aarch64 ONLY. |
We believe this is just a jdk11u problem, so updated PR to only remove from jdk11u. |
run tests |
PR TESTER RESULT ❎ Some pipelines failed or the job was aborted! ❎ |
run tests |
PR TESTER RESULT ❎ Some pipelines failed or the job was aborted! ❎ |
Signed-off-by: Andrew Leonard <[email protected]>
run tests |
PR TESTER RESULT ❎ Some pipelines failed or the job was aborted! ❎ |
run tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +1 on this change after restricting to JDK11u/aarch64.
We have adoptium/temurin-build#2961 to cover the development of a fix, after which we can re-enable.
PR TESTER RESULT ❎ Some pipelines failed or the job was aborted! ❎ |
Due to nearly always hanging during jdk11u test jobs, the aarch64Alpine platform is being removed from the default nightly jdk11u pipeline config.
See: adoptium/aqa-tests#3799
Signed-off-by: Andrew Leonard [email protected]