Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment fault with Ubuntu 24.04 20250120.5.0 #11471

Closed
2 of 16 tasks
RadxaYuntian opened this issue Jan 26, 2025 · 18 comments
Closed
2 of 16 tasks

Segment fault with Ubuntu 24.04 20250120.5.0 #11471

RadxaYuntian opened this issue Jan 26, 2025 · 18 comments

Comments

@RadxaYuntian
Copy link

Description

We have a scheduled job running every Sunday, which failed today, with no code change in last 2 weeks.

After checking the build log, it always failed at a dkms package installation. Once the workflow file is changed to print the dkms log, the error is always gcc segment fault.

Changing running environment to ubuntu22-04 fixed the segment fault. Action still failed but that's because the change we made to investigate this issue.

What may be unusual for us is that we are using binfmt to run aarch64 gcc in a devcontainer, because the final output is an aarch64 system image. So this is not some normal gcc failing.

Platforms affected

  • Azure DevOps
  • GitHub Actions - Standard Runners
  • GitHub Actions - Larger Runners

Runner images affected

  • Ubuntu 20.04
  • Ubuntu 22.04
  • Ubuntu 24.04
  • macOS 12
  • macOS 13
  • macOS 13 Arm64
  • macOS 14
  • macOS 14 Arm64
  • macOS 15
  • macOS 15 Arm64
  • Windows Server 2019
  • Windows Server 2022
  • Windows Server 2025

Image version and build link

20250120.5.0

Is it regression?

20250105.1.0: https://github.com/RadxaOS-SDK/rsdk/actions/runs/12848906725

Expected behavior

DKMS install successfully without gcc segfault.

Actual behavior

gcc segfault:

   2025-01-26 07:47:36,252 bdebstrap ERROR: mmdebstrap failed with exit code 25. See above for details.
  
  /workspaces/rsdk
  
  DKMS make.log for radxa-overlays-0.1.20 for kernel 6.1.68-2-stable (aarch64)
  Sun Jan 26 07:47:16 UTC 2025
  make: Entering directory '/usr/src/linux-headers-6.1.68-2-stable'
  Segmentation fault (core dumped)
  warning: the compiler differs from the one used to build the kernel
    The kernel was built by: aarch64-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1 20210110
    You are using:           gcc (Debian 12.2.0-14) 12.2.0
    CC [M]  /var/lib/dkms/radxa-overlays/0.1.20/build/radxa-overlays.o
    DTC     /var/lib/dkms/radxa-overlays/0.1.20/build/arch/arm64/boot/dts/amlogic/overlays/meson-g12-disable-gpu.dtbo
    DTC     /var/lib/dkms/radxa-overlays/0.1.20/build/arch/arm64/boot/dts/amlogic/overlays/meson-g12-disable-hdmi.dtbo
    DTC     /var/lib/dkms/radxa-overlays/0.1.20/build/arch/arm64/boot/dts/rockchip/overlays/radxa-s0-ext-antenna.dtbo
  gcc: internal compiler error: Segmentation fault signal terminated program cc1
  Please submit a full bug report, with preprocessed source (by using -freport-bug).
  See <file:///usr/share/doc/gcc-12/README.Bugs> for instructions.
  make[2]: *** [scripts/Makefile.lib:409: /var/lib/dkms/radxa-overlays/0.1.20/build/arch/arm64/boot/dts/rockchip/overlays/radxa-s0-ext-antenna.dtbo] Error 4
  make[1]: *** [scripts/Makefile.build:500: /var/lib/dkms/radxa-overlays/0.1.20/build/arch/arm64/boot/dts/rockchip/overlays] Error 2
  make[1]: *** Waiting for unfinished jobs....

Repro steps

  1. Clone https://github.com/RadxaOS-SDK/rsdk
  2. Cherry pick RadxaYuntian/rsdk@090908a to view dkms log
  3. Trigger workflow_dispatch for build.yaml
@RadxaYuntian
Copy link
Author

RadxaYuntian commented Jan 26, 2025

The gcc version Debian 12.2.0-14 was released on 2023/01/08, so the last successful run (2025/01/19) and today's failed run are both using the same version in the devcontainer.

@deviantintegral
Copy link

I can confirm this as well at https://github.com/pbkhrv/rtl_433-hass-addons/actions/runs/12972957498/job/36181006667. That job is compiling aarch64 in Docker under QEMU (I know, proper cross compiling would be better, but this is what the official Home Assistant builder action does so 🤷 ).

Is there a way to specify the runner image version to a previous 24.04 release to confirm the regression?

@woblerr
Copy link

woblerr commented Jan 26, 2025

The same problem for buildx for linux/arm64 via QEMU: https://github.com/woblerr/docker-pgbackrest/actions/runs/12965488407/job/36165276019#step:7:2658

Rollback to the ubuntu-22.04 runner solved the problem.

@MyreMylar
Copy link

Chiming in to say that we are seeing segfaults on our test runners for pygame-ce in the ppc64le architecture build since getting version 20250120.5.0. and, perhaps related, it also reporting that it can no longer detect the GNU compiler type for our S390x architecture build.

As @deviantintegral says it would be nice to have a way to roll back to a previous runner image to isolate the problem.

@RaviAkshintala
Copy link
Contributor

Hi @RadxaYuntian Thank you for bringing this issue to our attention. We will look into this issue and will update you after investigating.

stevenhorsman added a commit to stevenhorsman/cloud-api-adaptor that referenced this issue Jan 27, 2025
Due to an
[issue](actions/runner-images#11471)
with Ubuntu 24.04 20250120.5.0 runner image
we have been seeing failures in our multi-arch images for
the last few days which is blocking the release. I assume that
the issue is something related to qemu, so downgrade to 22.04
until this issue is resolved.

Signed-off-by: stevenhorsman <[email protected]>
@BrianPugh
Copy link

I'm also having very similar issues in tamp when using cibuildwheel to build python wheels for ppc64le and aarch64 targets.

@rtobar
Copy link

rtobar commented Jan 28, 2025

Same issue here with gcc segfault, but in my case I saw it both with ubuntu-latest and ubuntu-20.04. Updating/downgrading to ubuntu-22.04 solved it as mentioned by other people.

stevenhorsman added a commit to confidential-containers/cloud-api-adaptor that referenced this issue Jan 28, 2025
Due to an
[issue](actions/runner-images#11471)
with Ubuntu 24.04 20250120.5.0 runner image
we have been seeing failures in our multi-arch images for
the last few days which is blocking the release. I assume that
the issue is something related to qemu, so downgrade to 22.04
until this issue is resolved.

Signed-off-by: stevenhorsman <[email protected]>
charlesomer referenced this issue in mikebrady/shairport-sync Jan 28, 2025
Earlopain added a commit to docker-ruby-nightly/ruby that referenced this issue Feb 4, 2025
I'm getting tired of these failures. I thought it would be addressed soonish but apparently not.
actions/runner-images#11471
drivebyer added a commit to OT-CONTAINER-KIT/redis that referenced this issue Feb 5, 2025
@RadxaYuntian
Copy link
Author

We are building a Debian 12 system image, which contains a DKMS package, and DKMS itself was called by the kernel postinst hooks. So it is not like some intermediate environment that we are free to install any software: we need to clean up as well. Also there is no gcc-10 packages in Debian 12 repositories, so we will have to build gcc-10 from source with shipped gcc-12, and try to get rid of gcc-12 without breaking the package dependencies to install this one as the default compiler, then clean up gcc-10, and reinstall gcc-12.

I think this is too much effort for a temporary workaround.

We will roll back to 22.04 for now.

XuehaiPan added a commit to XuehaiPan/optree that referenced this issue Feb 6, 2025
Segfault caused by ubuntu-24.04 image update. Switch back to ubuntu-22.04.

See also:

- actions/runner-images#11471
XuehaiPan added a commit to XuehaiPan/optree that referenced this issue Feb 6, 2025
Segfault caused by ubuntu-24.04 image update. Switch back to ubuntu-22.04.

See also:

- actions/runner-images#11471
XuehaiPan added a commit to XuehaiPan/optree that referenced this issue Feb 6, 2025
Segfault caused by ubuntu-24.04 image update. Switch back to ubuntu-22.04.

See also:

- actions/runner-images#11471
@baszoetekouw
Copy link

I believe this is because of mismatch between gcc and kernel version(Kernel Version: 6.8.0-1020-azure) . this kernel version only support with gcc 10 . i request you if possible try to install gcc 10 and re try the test or you can proceed with the ubuntu-22 image until new version released. thank you !

I'm also seeing these issues with gcc-10 (e.g, https://github.com/OpenConext/OpenConext-BaseContainers/actions/runs/13181115842/job/36791848196), so I don't think the compiler version is the problem.

@baszoetekouw
Copy link

To add to the discussion: the failures seem to be a bit random, in the sense that builds reliably fail, but the exact spot where the failure occurs changes from build to build. I've been trying to work around this issue by using different compilers (multiple versions of gcc and clang for different parts of the build), only to see parts of the build fail that were fine in previous runs.

@baszoetekouw
Copy link

Another (and much nicer) workaround I've found, is to use the new ubuntu-24.04-arm runner to build arm64 stuff. Added bonus is that those are quite a but faster dan Qemu.

@kishorekumar-anchala
Copy link
Contributor

hI @RadxaYuntian - Kindly raise the issue in repo . thanks, closing the issue.

@deviantintegral
Copy link

Is this fixed or not? This issue isn't with partner images, it's with the official github ones. If it's fixed, what runner image release version will contain the fix?

@baszoetekouw
Copy link

@kishorekumar-anchala why are you closing this? This issue is about the official github ubuntu-24.04 runners, not about the partner-provided arm runners. I was only pointing to those to use as a workaround for the problems with the github-provided runners

@RadxaYuntian
Copy link
Author

RadxaYuntian commented Feb 8, 2025

I retriggered the failed workflow and the issue is not resolved. @RaviAkshintala please reopen this issue.

@woblerr
Copy link

woblerr commented Feb 8, 2025

hI @RadxaYuntian - Kindly raise the issue in repo . thanks, closing the issue.

This issue should stay here as it relates to the official image on GitHub. If this is not the case, then the restrictions should be specified in the image documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants