Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kola iscsi tests fail intermittently because of 502 error when pulling container #1866

Closed
dustymabe opened this issue Jan 24, 2025 · 4 comments
Assignees
Labels
jira for syncing to jira

Comments

@dustymabe
Copy link
Member

For example the iso-offline-install-iscsi.ibft-with-mpath.bios will fail without much info on what happened. If you look at the nested_vm_console.txt you'll see:

Trying to pull quay.io/coreos-assembler/coreos-assembler:latest...
Pulling image //quay.io/coreos-assembler/coreos-assembler:latest inside systemd: setting pull timeout to 5m0s
Error: copying system image from manifest list: determining manifest MIME type for docker://quay.io/coreos-assembler/coreos-assembler:latest: reading manifest sha256:9d5c60213547ca29dda408ed4b26313d0d5ff390b9beb0cbdfdc2d6866b33eb9 in quay.io/coreos-assembler/coreos-assembler: received unexpected HTTP status: 502 Bad Gateway

This is coming from this unit

I think this is fundamentally a problem we need to get fixed with podman to retry 500 errors - see containers/common#2299

@jbtrystram
Copy link
Contributor

Maybe we can add an image quadlet definition with some systems retries directives to workaround that.
I'm surprised systems does not retry the service before marking it failed though

@jlebon
Copy link
Member

jlebon commented Jan 30, 2025

Dusty and I discussed this a bit this morning. We mentioned injecting kola in to avoid having to pull down cosa, but that'd run into a userspace mismatch issue. Dusty had the idea of injecting it in the targetcli image which we already have to pull anyway. That makes sense to me. E.g. start a second instance of that image with a bind-mount to mount kola into /usr/bin. Though for netbooting, I think we'd need to add a few packages to the targetcli image.

We'd still be dependent on Quay, but having to pull way less should hopefully make it less flaky and the test faster.

dustymabe added a commit to dustymabe/coreos-assembler that referenced this issue Jan 31, 2025
In this case we'll run the podman container with --rootfs instead
of pulling a full 4+ GiB COSA image from quay for `kola qemuexec`.

This saves us quite a bit of time and bandwidth usage during a
pipeline run because we have more than 1 iscsi test.

This should also take care of coreos/fedora-coreos-tracker#1866
because we are no longer pulling this particular container from quay.
@dustymabe dustymabe self-assigned this Jan 31, 2025
@dustymabe dustymabe added the jira for syncing to jira label Jan 31, 2025
@dustymabe
Copy link
Member Author

This proposal virtiofs mounts in the COSA rootfs and then leverages podman --rootfs such that we don't need to download a container image at all: coreos/coreos-assembler#4013

nikita-dubrovskii pushed a commit to coreos/coreos-assembler that referenced this issue Jan 31, 2025
In this case we'll run the podman container with --rootfs instead
of pulling a full 4+ GiB COSA image from quay for `kola qemuexec`.

This saves us quite a bit of time and bandwidth usage during a
pipeline run because we have more than 1 iscsi test.

This should also take care of coreos/fedora-coreos-tracker#1866
because we are no longer pulling this particular container from quay.
dustymabe added a commit to dustymabe/coreos-assembler that referenced this issue Jan 31, 2025
In this case we'll run the podman container with --rootfs instead
of pulling a full 4+ GiB COSA image from quay for `kola qemuexec`.

This saves us quite a bit of time and bandwidth usage during a
pipeline run because we have more than 1 iscsi test.

This should also take care of coreos/fedora-coreos-tracker#1866
because we are no longer pulling this particular container from quay.

(cherry picked from commit 8dbfe3e)
dustymabe added a commit to dustymabe/coreos-assembler that referenced this issue Jan 31, 2025
In this case we'll run the podman container with --rootfs instead
of pulling a full 4+ GiB COSA image from quay for `kola qemuexec`.

This saves us quite a bit of time and bandwidth usage during a
pipeline run because we have more than 1 iscsi test.

This should also take care of coreos/fedora-coreos-tracker#1866
because we are no longer pulling this particular container from quay.

(cherry picked from commit 8dbfe3e)
dustymabe added a commit to coreos/coreos-assembler that referenced this issue Jan 31, 2025
In this case we'll run the podman container with --rootfs instead
of pulling a full 4+ GiB COSA image from quay for `kola qemuexec`.

This saves us quite a bit of time and bandwidth usage during a
pipeline run because we have more than 1 iscsi test.

This should also take care of coreos/fedora-coreos-tracker#1866
because we are no longer pulling this particular container from quay.

(cherry picked from commit 8dbfe3e)
dustymabe added a commit to coreos/coreos-assembler that referenced this issue Jan 31, 2025
In this case we'll run the podman container with --rootfs instead
of pulling a full 4+ GiB COSA image from quay for `kola qemuexec`.

This saves us quite a bit of time and bandwidth usage during a
pipeline run because we have more than 1 iscsi test.

This should also take care of coreos/fedora-coreos-tracker#1866
because we are no longer pulling this particular container from quay.

(cherry picked from commit 8dbfe3e)
@dustymabe
Copy link
Member Author

Closing this since coreos/coreos-assembler#4013 merged

marmijo pushed a commit to marmijo/coreos-assembler that referenced this issue Feb 4, 2025
In this case we'll run the podman container with --rootfs instead
of pulling a full 4+ GiB COSA image from quay for `kola qemuexec`.

This saves us quite a bit of time and bandwidth usage during a
pipeline run because we have more than 1 iscsi test.

This should also take care of coreos/fedora-coreos-tracker#1866
because we are no longer pulling this particular container from quay.
dustymabe added a commit to coreos/coreos-assembler that referenced this issue Feb 4, 2025
In this case we'll run the podman container with --rootfs instead
of pulling a full 4+ GiB COSA image from quay for `kola qemuexec`.

This saves us quite a bit of time and bandwidth usage during a
pipeline run because we have more than 1 iscsi test.

This should also take care of coreos/fedora-coreos-tracker#1866
because we are no longer pulling this particular container from quay.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira
Projects
None yet
Development

No branches or pull requests

3 participants