-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
setup-buildx-action hangs with 99% CPU when running on latest Github Action Runner v2.285.0 #117
Comments
I just ran into the same issue, as I attempt a new setup using private hosted runners. I thought it was related to running this in k8s, but this issue describes the exact same behavior. |
I'd also like to confirm that when I switch to apiVersion: actions.summerwind.dev/v1alpha1
kind: Runner
metadata:
name: example-runner
namespace: actions-runner-system
spec:
repository: <redacted>
image: summerwind/actions-runner:v2.284.0-ubuntu-20.04
env: [] |
@ghostsquad when you make this change ( Another interesting thing I discovered is that this action runs runs using v12 but the latest GitHub runner (v2.285.0) shipped with |
Interesting indeed, are you on GHE? |
Our e2e workflow works fine with Are you using a self-hosted runner? |
We're seeing the same error on self-hosted runners. Version auto-changed overnight and now the job is timing out on the
|
Yet another "same here". Self-hosted k8s runners on a private repo stuck forever at the buildx step. |
Seeing the same here, first noted on Tuesday afternoon (UK time) |
Can someone provide the workflow logs of an affected runner please (private info redacted ofc)? Thanks. |
Set-up job logs:
|
@jnsvd I'm really confused. Can you add this step before the - name: Install Docker Buildx
run: |
mkdir -p "$HOME/.docker/cli-plugins"
curl -SsL "https://github.com/docker/buildx/releases/download/v0.7.1/buildx-v0.7.1.linux-amd64" -o "$HOME/.docker/cli-plugins/docker-buildx"
chmod +x "$HOME/.docker/cli-plugins/docker-buildx"
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1 |
@crazy-max If we add the extra |
@crazy-max Step |
I can confirm that the |
Ok thanks for your feedback! One last thing, can you add the
This is not about the |
Let me know if you need more logs than this, but you can see that the fixed version goes to
|
There's no more output when enabling diagnostic logging. The pipeline really seems to get stuck on |
Would it be possible to add more logging for each of the steps? That might help future debugging |
Oh my bad it's |
Friendly ping @thboop @luketomlinson. Are you aware of some issues with the We use the |
Regarding adding more debug logging, I was stuck yesterday even though I had |
It looks like this line is the culprit: setup-buildx-action/src/buildx.ts Line 200 in 79abd3f
I wonder if this is not a bug with No specific reports on your side @TingluoHuang @thboop? |
@crazy-max |
Yes indeed I try to find the root cause maybe it has been backported to 12.22.7 I'm not quite sure but anyway I tried again on a fresh runner 2.285.0 and can't repro (Ubuntu 20.04): @richardpeng @dresselm @jnsvd @rjhenry @ghostsquad and others with this issue, can you show your |
I think it might have something to do with the docker container volume mount. I am playing with it right now. |
@TingluoHuang Thanks! |
it's related to the double copy... 😵💫
|
Ok so cleaning dest beforehand should mitigate this issue in the meantime. |
I've tried to downgrade to |
Yeah, unfortunately the runners self-update. I'm running a very hacky loop on my local machine to keep pipelines running for myself - fortunately, I'm the one triggering most of these pipelines so it's only me inconvenienced. while true; do
for pod in $(k get po -n github-actions-runner -l runner-deployment-name=assureddt -o jsonpath='{.items[*].metadata.name}'); do
k exec -it -n github-actions-runner ${pod} -c runner -- rm -r /opt/hostedtoolcache/buildx/
done
sleep 60
done It's dirty and I'm not proud of it, but it is working. |
@maxnowack #117 (comment) should mitigate your issue in the meantime. |
Should be solved since https://github.com/actions/runner/releases/tag/v2.285.1 |
I can confirm, that it is resolved with the new runner version 🙂 |
we are now running into this issue since our runners updated to the extra |
@novascreen Must be linked to actions/runner#1651 which updates node 12 with the same issue. Suggest to open an issue on https://github.com/actions/runner. cc @TingluoHuang @thboop |
@novascreen what Linux kernel version your environment is on? there was a kernel bug get patched in Linux 5.4 which break You can read more from actions/runner#1536 |
FYI, this is also happening on runner version In the "Docker info" section for
Not sure if that is significant or not. Then on the |
This latest GHA runner, Theoretically, GHA-folk believe that since the Kernel issue is fixed, they are good to go with the |
@crazy-max @thril based on a comment with @crazy-max, it appears that we will need to update our runners to use the latest Linux (or other *nix) kernel. Seems like this is not the most customer-friendly way for GHA to manage this upgrade. Seems like recommending the OS-level upgrade first (with a node deprecation warning) would have been better. |
@crazy-max any inkling of which runner OS will work today? |
We ended up pinning our runner to version |
Any idea @TingluoHuang? If you're using Ubuntu you can check https://bugs.launchpad.net/ubuntu/+source/linux-base/+bug/1953199
|
Can we re-open this issue since we (and other people in this thread) are still running into this problem? |
OK, what solved the issue for us was pinning the version to 287.1 and disabling the auto update of the runner:
|
Afaik it's a runner/kernel issue, not the action itself. |
Just to help anyone else out looking at this thread, a Linux kernel fix (at least 5.4) will address this issue. |
Hi, I used version v2.287 of github runner as a workarround to the reportred issue. Everything was fine, but from today GH dont't want to process my jobs because is says that I have unsupported version. I've updated to version Do you have any solution for that? |
do you have more details/logs? |
GH runner version:
This is the result of
|
Behaviour
Our runners were auto-updated this morning from v2.284.0 to v2.285.0 and began to hang on the
Download and Install Buildx
step:Steps to reproduce this issue
docker/setup-buildx-action@v1
in your workflowExpected behaviour
Actual behaviour
Configuration
Note: all steps prior to the failing step pass as expected
Logs
Our logs are littered with private, implementation details. Could you help me isolate which part of the logs you need for diagnosis?
The text was updated successfully, but these errors were encountered: