Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jdk(20) alpine x64 linux smoke test hanging #3031

Closed
andrew-m-leonard opened this issue Jul 8, 2022 · 14 comments
Closed

jdk(20) alpine x64 linux smoke test hanging #3031

andrew-m-leonard opened this issue Jul 8, 2022 · 14 comments
Labels
alpine-linux Issues that affect or relate to the Alpine LINUX OS buildbreak High priority issues that cause build breaks in jenkins or build scripts x-linux Issues that affect or relate to the x64/x32 LINUX OS

Comments

@andrew-m-leonard
Copy link
Contributor

https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk/job/jdk-alpine-linux-x64-temurin_SmokeTests/

@andrew-m-leonard andrew-m-leonard added the buildbreak High priority issues that cause build breaks in jenkins or build scripts label Jul 8, 2022
@github-actions github-actions bot added alpine-linux Issues that affect or relate to the Alpine LINUX OS x-linux Issues that affect or relate to the x64/x32 LINUX OS labels Jul 8, 2022
@zdtsw
Copy link
Contributor

zdtsw commented Aug 23, 2022

seems it is not only jdk20 has this problem
https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk19/job/jdk19-alpine-linux-x64-temurin_SmokeTests/17/console

00:39:11.008       [copy] Copying 3 files to /home/jenkins/workspace/build-scripts/jobs/jdk19/jdk19-alpine-linux-x64-temurin_SmokeTests/jvmtest/functional/buildAndPackage
Cancelling nested steps due to timeout
10:36:28.088  Sending interrupt signal to process

@zdtsw
Copy link
Contributor

zdtsw commented Aug 23, 2022

@zdtsw
Copy link
Contributor

zdtsw commented Aug 30, 2022

still do not understand why in GH action test work on both jdk19/20
e.g https://github.com/adoptium/temurin-build/runs/8045139038?check_suite_focus=true

seems the GHA is using adoptopenjdk/alpine3_build_image image
but jenkins docker alpine agent is using different dockerfile from infrastructure/ansible/playbooks/AdoptOpenJDK_Unix_Playbook/roles/DockerStatic/Dockerfiles

@zdtsw
Copy link
Contributor

zdtsw commented Aug 31, 2022

so i did a test https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk19/job/jdk19-alpine-linux-x64-temurin_SmokeTests/23/console
the change is to commend out block

<!-- <copy todir="${DEST}">
           <fileset dir="${src}/../" includes="*.xml"/>
           <fileset dir="${src}/../" includes="*.mk"/>
       </copy> -->

in build.xml and seems the hanging part is the copy 3 files

@zdtsw
Copy link
Contributor

zdtsw commented Aug 31, 2022

since i cannot abort on-going pipeline, i made a new test on grinder https://ci.adoptopenjdk.net/job/Grinder/5519/console
basically it moved the "copy" into a dedicated target and with check if files are there to fail the job instead of hanging there.
i think it is a known issue with ant to do copy if the source files are not existing, then the target just hang there. shows testng.xml (or any *.xml is missing)

with more test
it seems not about the file is missing but the does not work at all. does not matter it is matching file or explicit set file name.

@zdtsw
Copy link
Contributor

zdtsw commented Aug 31, 2022

https://ci.adoptopenjdk.net/job/Grinder/5541/console is for jdk18 with my test branch issue/3031 with explicit copy two xml files
which works
but it does not work on jdk19 https://ci.adoptopenjdk.net/job/Grinder/5534/console

@zdtsw
Copy link
Contributor

zdtsw commented Aug 31, 2022

https://ci.adoptopenjdk.net/job/Grinder/5548/console is on jdk 19
when i change from "copy" target to "executable of cp" then it works.

@zdtsw
Copy link
Contributor

zdtsw commented Aug 31, 2022

https://ci.adoptopenjdk.net/job/Grinder/5552/console is the same code but run for windows jdk19
https://ci.adoptopenjdk.net/job/Grinder/5553/console for mac jdk20

Could it be the problem that jdk19/20 does not work well with ant 1.10.9 for the copy target?

4:58:47  Run D:\jenkins\workspace\Grinder/openjdkbinary/j2sdk-image/bin/java -version
14:58:47  =JAVA VERSION OUTPUT BEGIN=
14:58:47  openjdk version "19-beta" 2022-09-20

@smlambert
Copy link
Contributor

Based on all of the information we have gathered so far, here is what we know about this smoke test:

  • Passes on other platforms, only hanging on alpine-linux
  • Passes on jdk18u and earlier, only hanging on jdk19 & jdk20
  • The ant dist target runs fine and copies things well in other test jobs for alpine-linux jdk19 and jdk20
  • Hangs on all machines labelled ci.role.test&&hw.arch.x86&&sw.os.alpine-linux
  • Passes when run in a github workflow environment

It is because of the fact that this test can run fine in a github workflow environment and that other test jobs do not hang, that it reminded me of one other problem we have been seeing related to alpine-linux that needs to be addressed (which I think is related to or the actual cause of this problem)... the smoke test job does not seem to follow the other test jobs naming convention as evidenced by how it gets displayed in TRSS:

Screen Shot 2022-09-02 at 7 36 48 AM

I believe if we correct that naming issue, we will no longer see this hang. I suspect, but have not confirmed that some dependent ant targets defined in TKG/scripts/build_test.xml must create dirs based off the known platform name x86-64_alpine-linux versus x64_alpine-linux.

@llxia
Copy link
Contributor

llxia commented Sep 2, 2022

Maybe @renfeiw can comment from TKG perspective.

For TRSS, it sets the platform based on the job name. In this case, we are using 2 different naming conventions for alpine linux platform. It causes a mismatch in the TRSS Grid view as shown above screenshot.

  • the smoke test job name - jdk-alpine-linux-x64-temurin_SmokeTests
  • the regular test job name - Test_openjdk19_hs_sanity.openjdk_x86-64_alpine-linux

This is a known issue adoptium/aqa-test-tools#695

@zdtsw
Copy link
Contributor

zdtsw commented Sep 2, 2022

Thanks @llxia !
But does it mean, it is just how TRSS presents the result with different naming convention, not really something related to running the test?

@smlambert
Copy link
Contributor

What I am wondering is what is this block of code doing for the alpine-linux case for smoke tests:
https://github.com/adoptium/ci-jenkins-pipelines/blob/master/pipelines/build/common/openjdk_build_pipeline.groovy#L193-L198

@zdtsw
Copy link
Contributor

zdtsw commented Nov 17, 2022

some findings:
https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk19u/job/jdk19u-alpine-linux-x64-temurin_SmokeTests/7/
https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk19u/job/jdk19u-alpine-linux-x64-temurin_SmokeTests/9/
https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk/job/jdk-alpine-linux-x64-temurin_SmokeTests/72/
https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk/job/jdk-alpine-linux-x64-temurin_SmokeTests/71/

these are the all "green" ones.
the common part of these builds are , they are running on
test-docker-alpine314-x64-1-NEW
test-docker-alpine314-x64-2-NEW
could that be the alpine314 works but not alpine312 or it is the -NEW nodes?
so I did a test to bind to an old alpine314: https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk/job/jdk-alpine-linux-x64-temurin_SmokeTests/75/console . it is hanging in the same place

=>only two new nodes test-docker-alpine314-x64-1-NEW and test-docker-alpine314-x64-2-NEW work
=> underlying VM running container from ubuntu2004 to ubuntu2204

@zdtsw
Copy link
Contributor

zdtsw commented Dec 7, 2022

close this issue, both jdk19 and 20 smoketest work on alpine x64 since 24th Nov.
the problem is related to the jenkins agent we use.
once they are replaced to the new ones, all go well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alpine-linux Issues that affect or relate to the Alpine LINUX OS buildbreak High priority issues that cause build breaks in jenkins or build scripts x-linux Issues that affect or relate to the x64/x32 LINUX OS
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

4 participants