Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System unavailable: test-macstadium-macos1014-x64-1 has no space left on device #3017

Closed
adamfarley opened this issue Apr 4, 2023 · 5 comments

Comments

@adamfarley
Copy link
Contributor

https://ci.adoptium.net/job/Test_openjdk8_hs_extended.openjdk_x86-64_mac/125/console

java.nio.file.FileSystemException: /home/jenkins/workspace/Test_openjdk8_hs_extended.openjdk_x86-64_mac: No space left on device

The machine may not be completely down, but I don't see "No space left on device" being good for the health of any jobs that run on it.

Requesting that an admin take a look to see why we're out of space. That, or granting me ssh access so I can do it.

Thank you. :)

@sxa
Copy link
Member

sxa commented Apr 5, 2023

There are no issues with space on that system at present, however I am a bit perplexed as to why the mac test jobs are scheduling part of their execution on Alpine hosts - that seems like a potential bug (and means that the title vs description of this issue is somewhat confusing!)

@adamfarley adamfarley changed the title System unavailable: test-docker-alpine313-aarch64-1 has no space left on device System unavailable: test-macstadium-macos1014-x64-1 has no space left on device Apr 5, 2023
@adamfarley
Copy link
Contributor Author

Hah, good catch. I must have C+P'd the host name from the wrong job. I'm sorry.

Fixed now. :)

@smlambert
Copy link
Contributor

The original report was correctly made by you, Adam, and is working as is currently designed, the child test jobs are sent off to x86-64_mac nodes to run, then after, the artifacts from those child jobs are gathered and archived on the parent test job (and this step is run on any available node since it is not platform specific).

08:35:13  Starting building: [Test_openjdk8_hs_extended.openjdk_x86-64_mac_testList_2 #82](https://ci.adoptium.net/job/Test_openjdk8_hs_extended.openjdk_x86-64_mac_testList_2/82/)
08:35:13  Starting building: [Test_openjdk8_hs_extended.openjdk_x86-64_mac_testList_0 #92](https://ci.adoptium.net/job/Test_openjdk8_hs_extended.openjdk_x86-64_mac_testList_0/92/)
08:35:13  Starting building: [Test_openjdk8_hs_extended.openjdk_x86-64_mac_testList_1 #92](https://ci.adoptium.net/job/Test_openjdk8_hs_extended.openjdk_x86-64_mac_testList_1/92/)
09:30:07  Build [Test_openjdk8_hs_extended.openjdk_x86-64_mac_testList_0 #92](https://ci.adoptium.net/job/Test_openjdk8_hs_extended.openjdk_x86-64_mac_testList_0/92/) completed: SUCCESS
[Pipeline] }
09:55:02  Build [Test_openjdk8_hs_extended.openjdk_x86-64_mac_testList_1 #92](https://ci.adoptium.net/job/Test_openjdk8_hs_extended.openjdk_x86-64_mac_testList_1/92/) completed: SUCCESS
[Pipeline] }
10:36:33  Build [Test_openjdk8_hs_extended.openjdk_x86-64_mac_testList_2 #82](https://ci.adoptium.net/job/Test_openjdk8_hs_extended.openjdk_x86-64_mac_testList_2/82/) completed: SUCCESS
[Pipeline] }
[Pipeline] }
[Pipeline] // parallel
[Pipeline] node
10:36:34  Running on [test-docker-alpine313-aarch64-1](https://ci.adoptium.net/manage/computer/test-docker-alpine313-aarch64-1/) in /home/jenkins/workspace/Test_openjdk8_hs_extended.openjdk_x86-64_mac
[Pipeline] {
[Pipeline] cleanWs
[Pipeline] echo
10:36:34  Exception: java.nio.file.FileSystemException: /home/jenkins/workspace/Test_openjdk8_hs_extended.openjdk_x86-64_mac: No space left on device
[Pipeline] sh

In any event, we never want to find ourselves in a situation where any online machines have 'no space left on device' (and if we do have this situation, we would benefit from taking the machine offline and/or correcting it immediately before many more jobs are sent to it only to fail. We do have a mechanism in test pipelines to take machines offline when certain types of failures like this occur, but we have never employed this feature at the ci.adoptium.net Jenkins server to date.

@sxa
Copy link
Member

sxa commented Apr 6, 2023

Hmmm so the "parent test job" that does the archiving isn't currently tied to the master (or worker) node? I thought we had most things that didn't care set to that for consistency but maybe that didn't make it to the test pipelines.

@sxa
Copy link
Member

sxa commented Apr 6, 2023

On the basis of the above aqa-tests issue talking about worker I'm going to close this as the space problem is resolved, and further discussion on the labels to use for the collation jobs can continue there,

@sxa sxa closed this as completed Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants