Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: smoke test not working for jdk19/20 on alpine x64 #3088

Closed
wants to merge 12 commits into from

Conversation

zdtsw
Copy link
Contributor

@zdtsw zdtsw commented Aug 31, 2022

  • suspect it could be ant with underlying jdk version 19/20 not working well with target "copy"
  • this PR changes:
  1. add check if testng.xml exist before copy (this is a known issue in ant that if source file is missing, copy could hang)
  2. use cp command than copy target (we are running either on linux, or windows with cygwin or mac, so cp should work in all three cases)
  3. only testng.xml is needed, not build.xml or playlist.xml or any makefile for smoketest. But to make it align with old code, do cp on all matched *.xml files

P.S: this PR is only trying to fix the hanging smoke test job from Jenkins. It has not solid evidence that Ant is not working with jdk19/20 and why it is only seen on certain jobs on certain platform+OS

test run:
https://ci.adoptopenjdk.net/job/Grinder/5548/console is on jdk 19 alpine x64
https://ci.adoptopenjdk.net/job/Grinder/5552/console is windows jdk19 x64
https://ci.adoptopenjdk.net/job/Grinder/5553/console is mac jdk20 aarch64

Fix: #3031

@github-actions github-actions bot added aarch Issues that affect or relate to the aarch ARCHITECTURE alpine-linux Issues that affect or relate to the Alpine LINUX OS macos Issues that affect or relate to the MAC OS testing Issues that enhance or fix our test suites windows Issues that affect or relate to the WINDOWS OS labels Aug 31, 2022
@github-actions github-actions bot added alpine-linux Issues that affect or relate to the Alpine LINUX OS macos Issues that affect or relate to the MAC OS testing Issues that enhance or fix our test suites windows Issues that affect or relate to the WINDOWS OS and removed windows Issues that affect or relate to the WINDOWS OS testing Issues that enhance or fix our test suites macos Issues that affect or relate to the MAC OS alpine-linux Issues that affect or relate to the Alpine LINUX OS labels Aug 31, 2022
@llxia
Copy link
Contributor

llxia commented Aug 31, 2022

If ant copy does not work for jdk19/20 on alpine x64, we should report this issue to ant.

@smlambert
Copy link
Contributor

smlambert commented Aug 31, 2022

If ant copy is not working on x64 alpine-linux, how are the other test jobs running successfully, example:
https://ci.adoptopenjdk.net/job/Test_openjdk20_hs_sanity.functional_x86-64_alpine-linux/11/consoleFull

Based on the console outputs, some ant targets that also copy files run successfully in the smoke job...

22:33:03  dist_functional:
22:33:03       [copy] Copying 2 files to /home/jenkins/workspace/build-scripts/jobs/jdk19/jdk19-alpine-linux-x64-temurin_SmokeTests/jvmtest/functional
22:33:03  
22:33:03  dist:
22:33:04        [jar] Building jar: /home/jenkins/workspace/build-scripts/jobs/jdk19/jdk19-alpine-linux-x64-temurin_SmokeTests/jvmtest/functional/buildAndPackage/BuildAndPackagingTests.jar
22:33:04       [copy] Copying 3 files to /home/jenkins/workspace/build-scripts/jobs/jdk19/jdk19-alpine-linux-x64-temurin_SmokeTests/jvmtest/functional/buildAndPackage
Cancelling nested steps due to timeout
08:30:24  Sending interrupt signal to process
08:30:28  143

dist_functional target successfully copies 2 files to the workdir, then the dist target builds the jar successfully and hangs when trying to move the 2 files plus the jar file. Is there something unusual about that jar file?

I see that in other test runs on x64 alpine-linux ant is able to build and copy xml and jar files, example from https://ci.adoptopenjdk.net/job/Test_openjdk20_hs_sanity.functional_x86-64_alpine-linux/11/consoleFull

07:27:23  dist:
07:27:23        [jar] Building jar: /home/jenkins/workspace/Test_openjdk20_hs_sanity.functional_x86-64_alpine-linux/jvmtest/functional/Java12andUp/GeneralTest.jar
07:27:23       [copy] Copying 3 files to /home/jenkins/workspace/Test_openjdk20_hs_sanity.functional_x86-64_alpine-linux/jvmtest/functional/Java12andUp
07:27:23  

Perhaps try a run with ant -verbose or ant -debug to see more about what is really happening?

@zdtsw
Copy link
Contributor Author

zdtsw commented Aug 31, 2022

this is the part i am confused why it only happens on smoketest but not the other tests on jdk19/20
GH action is using a different runner with adoptopenjdk/alpine3_build_image on alpine 3.16 which is not the one we use to setup jenkins agnet.
i have a run with -d -verbose https://ci.adoptopenjdk.net/job/Grinder/5556/console

16:23:47        [jar] Location: /home/jenkins/workspace/Grinder/aqa-tests/functional/buildAndPackage/build.xml:53: 
16:23:47       [copy] Copying 3 files to /home/jenkins/workspace/Grinder/jvmtest/functional/buildAndPackage
16:23:47       [copy] Copying /home/jenkins/workspace/Grinder/aqa-tests/functional/buildAndPackage/build.xml to /home/jenkins/workspace/Grinder/jvmtest/functional/buildAndPackage/build.xml
Aborted by [Wen Zhou](https://ci.adoptopenjdk.net/user/zdtsw)
16:54:55  Sending interrupt signal to process

@smlambert
Copy link
Contributor

Not sure how the Github runner info is relevant as your Grinder is running on test-docker-alpine314-x64-1 and so are the smoke tests, not run on github runners. Is test-docker-alpine314-x64-1 setup using the alpine3_build_image?

@zdtsw
Copy link
Contributor Author

zdtsw commented Aug 31, 2022

Not sure how the Github runner info is relevant as your Grinder is running on test-docker-alpine314-x64-1 and so are the smoke tests, not run on github runners. Is test-docker-alpine314-x64-1 setup using the alpine3_build_image?

we do not have problem to have these smoke tests run in GH action on all jdk versions.
all of them are running from alpine_3_build_image
these jenkins agent (e.g test-docker-alpine314-x64-1 are setup by ansible playbook based on different dockerfiles(alpine3.11 3.12 and 3.14)

@smlambert
Copy link
Contributor

Thanks @zdtsw!

I see now it mentioned in the issue that this PR is intended to fix, may I ask that you use Closes or Fixes keyword in your PRs so that one can easily locate the issue it relates to? I missed seeing the Ref to #3031

It seems that would be quite a relevant difference.

Copy link
Contributor

@karianna karianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XML fix looks better, do we know historically why we shipped the mk files.

test/functional/buildAndPackage/build.xml Outdated Show resolved Hide resolved
@zdtsw
Copy link
Contributor Author

zdtsw commented Sep 1, 2022

XML fix looks better, do we know historically why we shipped the mk files.

not really sure.
maybe when added smoke into temurin-build did the same as aqa-tests was doing:
https://github.com/adoptium/aqa-tests/blob/master/functional/security/build.xml#L57

@github-actions github-actions bot added alpine-linux Issues that affect or relate to the Alpine LINUX OS macos Issues that affect or relate to the MAC OS testing Issues that enhance or fix our test suites windows Issues that affect or relate to the WINDOWS OS and removed macos Issues that affect or relate to the MAC OS testing Issues that enhance or fix our test suites windows Issues that affect or relate to the WINDOWS OS alpine-linux Issues that affect or relate to the Alpine LINUX OS labels Sep 1, 2022
@karianna
Copy link
Contributor

karianna commented Sep 1, 2022

https://github.com/adoptium/aqa-tests/blob/master/functional/security/build.xml#L57

git blame tells me @smlambert authored that back in 2020 ;-) - Hey Shelley, any chance you recall this from memory lane?

@smlambert
Copy link
Contributor

smlambert commented Sep 1, 2022

Likely to handle future cases where we may choose to handle nested test dirs (as we do for other types of testing and knowing we intend to continue adding smoke tests).

As discussed in a call this morning, I do not want this PR merged as a workaround. I want to continue to dig to uncover the root cause of the problem first, as this PR is a big hack around a real problem that I'd like us to try a bit longer to understand and resolve before working around the unknown.

@smlambert
Copy link
Contributor

smlambert commented Sep 1, 2022

diff-ing a failing jdk20 run versus a working jdk18u smoke test run to the point of the ant dist target where jdk20 hangs to see if anything looks off or can tell us more:

Failing jdk20 run Passing jdk18u run
Running on test-docker-alpine314-x64-2 in /home/jenkins/workspace/build-scripts/jobs/jdk/jdk-alpine-linux-x64-temurin_SmokeTests Running on test-docker-alpine311-x64-1 in /home/jenkins/workspace/build-scripts/jobs/jdk18u/jdk18u-alpine-linux-x64-temurin_SmokeTests
NODE_LABELS=ci.role.test hw.arch.x86 sw.os.alpine-linux test-docker-alpine314-x64-2 NODE_LABELS=AMD ci.role.test hw.arch.x86 sw.os.alpine-linux test-docker-alpine311-x64-1
00:27:13 =JAVA VERSION OUTPUT BEGIN=openjdk version "20-beta" 2023-03-21 OpenJDK Runtime Environment Temurin-20+12-202208310337 (build 20-beta+12-202208310337) OpenJDK 64-Bit Server VM Temurin-20+12-202208310337 (build 20-beta+12-202208310337, mixed mode, sharing) =JAVA VERSION OUTPUT END= =RELEASE INFO BEGIN= IMPLEMENTOR="Eclipse Adoptium" IMPLEMENTOR_VERSION="Temurin-20+12-202208310337" JAVA_VERSION="20" JAVA_VERSION_DATE="2023-03-21" 20:21:33 =JAVA VERSION OUTPUT BEGIN=openjdk version "18.0.2.1-beta" 2022-08-18 OpenJDK Runtime Environment Temurin-18.0.2.1+1-202208312342 (build 18.0.2.1-beta+1-202208312342) OpenJDK 64-Bit Server VM Temurin-18.0.2.1+1-202208312342 (build 18.0.2.1-beta+1-202208312342, mixed mode, sharing) =JAVA VERSION OUTPUT END= =RELEASE INFO BEGIN= IMPLEMENTOR="Eclipse Adoptium" IMPLEMENTOR_VERSION="Temurin-18.0.2.1+1-202208312342" JAVA_VERSION="18.0.2.1" JAVA_VERSION_DATE="2022-08-18"
Could not add alternate for '/home/jenkins/openjdk_cache': reference repository '/home/jenkins/openjdk_cache' is not a local repository. Updating files: 52% (5270/10087) Updating files: 53% (5347/10087) Updating files: 99% (9987/10087) Updating files: 100% (10087/10087) Updating files: 100% (10087/10087), done. check OpenJ9 Repo sha /home/jenkins/workspace/build-scripts/jobs/jdk/jdk-alpine-linux-x64-temurin_SmokeTests/aqa-tests/TKG/scripts/getSHA.sh --repo_dir /home/jenkins/workspace/build-scripts/jobs/jdk/jdk-alpine-linux-x64-temurin_SmokeTests/aqa-tests/openj9 --output_file /home/jenkins/workspace/build-scripts/jobs/jdk/jdk-alpine-linux-x64-temurin_SmokeTests/aqa-tests/TKG/SHA.txt Check sha in /home/jenkins/workspace/build-scripts/jobs/jdk/jdk-alpine-linux-x64-temurin_SmokeTests/aqa-tests/openj9 and store the info in /home/jenkins/workspace/build-scripts/jobs/jdk/jdk-alpine-linux-x64-temurin_SmokeTests/aqa-tests/TKG/SHA.txt Could not add alternate for '/home/jenkins/openjdk_cache': reference repository '/home/jenkins/openjdk_cache' is not a local repository. check OpenJ9 Repo sha /home/jenkins/workspace/build-scripts/jobs/jdk18u/jdk18u-alpine-linux-x64-temurin_SmokeTests/aqa-tests/TKG/scripts/getSHA.sh --repo_dir /home/jenkins/workspace/build-scripts/jobs/jdk18u/jdk18u-alpine-linux-x64-temurin_SmokeTests/aqa-tests/openj9 --output_file /home/jenkins/workspace/build-scripts/jobs/jdk18u/jdk18u-alpine-linux-x64-temurin_SmokeTests/aqa-tests/TKG/SHA.txt Check sha in /home/jenkins/workspace/build-scripts/jobs/jdk18u/jdk18u-alpine-linux-x64-temurin_SmokeTests/aqa-tests/openj9 and store the info in /home/jenkins/workspace/build-scripts/jobs/jdk18u/jdk18u-alpine-linux-x64-temurin_SmokeTests/aqa-tests/TKG/SHA.txt
This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-thread-multi This is perl 5, version 30, subversion 3 (v5.30.3) built for x86_64-linux-thread-multi
cpuCores : 56 cpuCores : 48
GNU Make 4.3 GNU Make 4.2.1
dist: [jar] Building jar: /home/jenkins/workspace/build-scripts/jobs/jdk/jdk-alpine-linux-x64-temurin_SmokeTests/jvmtest/functional/buildAndPackage/BuildAndPackagingTests.jar [copy] Copying 3 files to /home/jenkins/workspace/build-scripts/jobs/jdk/jdk-alpine-linux-x64-temurin_SmokeTests/jvmtest/functional/buildAndPackage Aborted by Wen Zhou dist: [jar] Building jar: /home/jenkins/workspace/build-scripts/jobs/jdk18u/jdk18u-alpine-linux-x64-temurin_SmokeTests/jvmtest/functional/buildAndPackage/BuildAndPackagingTests.jar [copy] Copying 3 files to /home/jenkins/workspace/build-scripts/jobs/jdk18u/jdk18u-alpine-linux-x64-temurin_SmokeTests/jvmtest/functional/buildAndPackage clean: ... continues to success

Notables / questions:

  • we should try running jdk20 smoke tests explicitly on test-docker-alpine311-x64-1 to see if it behaves differently (Grinder/5577) looks like also hangs

  • why does jdk18u job get sent to NODE_LABELS=AMD && test labels ? is it explicitly set somewhere?

  • for jdk20, it is considered tip? uses job and dir names as jdk versus jdkXu ? does this matter to smoke tests? does not to other tests. (do NOT expect this to be an issue, but noting it in case)

  • perl and make versions are different if it matters

  • what files are getting updated in the jdk20 run?

  • also rerun -d -verbose https://ci.adoptopenjdk.net/job/Grinder/5556/ but on a known to pass jdk18u run in Grinder/5578, to see the diff

    • git versions also differ on different alpine machines (git version 2.32.0 on git version 2.26.3 versus git version 2.26.3 on test-docker-alpine312-x64-2)

@zdtsw zdtsw marked this pull request as draft September 3, 2022 09:44
@zdtsw
Copy link
Contributor Author

zdtsw commented Dec 7, 2022

close this PR, looks like both jdk19 and 20 smoketest work on alpine x64 since 24th Nov.

@zdtsw zdtsw closed this Dec 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aarch Issues that affect or relate to the aarch ARCHITECTURE alpine-linux Issues that affect or relate to the Alpine LINUX OS macos Issues that affect or relate to the MAC OS testing Issues that enhance or fix our test suites windows Issues that affect or relate to the WINDOWS OS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

jdk(20) alpine x64 linux smoke test hanging
4 participants