Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Linux kernel from 6.6 to 6.12 #2300

Draft
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

ader1990
Copy link
Contributor

@ader1990 ader1990 commented Sep 10, 2024

Upgrade Linux kernel from the 6.6.y stable branch to 6.12.y stable branch (when it gets released).

See: flatcar/Flatcar#1527

This PR is mostly to reveal any possible big blockers before getting to the new 6.12 LTS release.

Currently, upstream Gentoo package has 6.10 and 6.11.

Tested 6.10.y and it works as expected.

Now testing 6.11.y.

Testing done

[Describe the testing you have done before submitting this PR. Please include both the commands you issued as well as the output you got.]

  • Changelog entries added in the respective changelog/ directory (user-facing change, bug fix, security fix, update)
  • Inspected CI output for image differences: /boot and /usr size, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.

Boot partition size:

arm64: /dev/nvme0n1p1     129039    63368     65672  50% /boot
amd64: /dev/vda1          129039    62852     66187  49% /boot

@ader1990
Copy link
Contributor Author

ZFS 2.2.5 does not support kernel 6.10, the zfs upgrade patches will be dropped after portage stable update PR gets merged (with 2.2.6 zfs): #2298

Copy link

github-actions bot commented Sep 10, 2024

@@ -36,6 +36,5 @@ IUSE=""
# local patches overlap with the upstream patch.
UNIPATCH_LIST="
${PATCH_DIR}/z0001-kbuild-derive-relative-path-for-srctree-from-CURDIR.patch \
${PATCH_DIR}/z0002-revert-pahole-flags.patch \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this?
When pahole is executed with -j (parallel) then btf metadata order is non-deterministic and the built kernel and modules don't match.

It doesn't have to be a revert, but we need to carry some patch (unless something significant changed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely, working on it. pahole flags moved to scripts/Makefile.btf, so that needs to be addressed, was working now on a patch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recently updated pahole to a newer version (1.27) that was supposed to be reproducible regardless of how many threads it uses, but dropping the patch didn't work for me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we may still need a kernel patch to pass --btf_features=all,reproducible_build: https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=v1.27&id=43bd3efa85656565129063cdd6dd7499e44a7867

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be upstreamed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will test it asap and send it to LKML if it works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the reproducible_build flag to the pahole params, although I don't like how that addition is done, that file is a beautiful soup and needs some better management.

@jepio
Copy link
Member

jepio commented Sep 10, 2024

Things like this make me want to wait with an upgrade to 6.10:
"[regression] significant delays when secureboot is enabled since 6.10" https://lore.kernel.org/lkml/[email protected]/T/#mb17f32470541d54f7ee45987d510aa45b7557969

It takes a couple minor releases on a new stable branch before it is ready to make its way into Flatcar.

@ader1990 ader1990 marked this pull request as draft September 10, 2024 10:36
@ader1990
Copy link
Contributor Author

Things like this make me want to wait with an upgrade to 6.10: "[regression] significant delays when secureboot is enabled since 6.10" https://lore.kernel.org/lkml/[email protected]/T/#mb17f32470541d54f7ee45987d510aa45b7557969

It takes a couple minor releases on a new stable branch before it is ready to make its way into Flatcar.

Adding the blocker bug here: https://bugzilla.kernel.org/show_bug.cgi?id=219229

Possible resolution from the bug discussion in the kernel config:

CONFIG_TCG_TPM2_HMAC=n

@ader1990
Copy link
Contributor Author

The feature CONFIG_TCG_TPM2_HMAC has been introduced in 6.10 as extra security layer: https://github.com/torvalds/linux/blob/master/drivers/char/tpm/Kconfig#L37

@ader1990
Copy link
Contributor Author

Managed to get the ARM64 image built, but the AMD64 image fails at the initrd/grub stage with error: cpio: premature end of file.

Full error bellow:

2024-09-11T13:33:24.1344201Z INFO    grub_install.sh: Installing GRUB x86_64-xen in flatcar_production_image.bin
2024-09-11T13:33:24.1537425Z INFO    grub_install.sh: Compressing modules in flatcar/grub/x86_64-xen
2024-09-11T13:33:25.2833839Z INFO    grub_install.sh: Generating flatcar/grub/x86_64-xen/load.cfg
2024-09-11T13:33:25.3866147Z INFO    grub_install.sh: Generating flatcar/grub/x86_64-xen/core.elf
2024-09-11T13:33:25.4519108Z INFO    grub_install.sh: Installing default x86_64 Xen bootloader.
2024-09-11T13:33:25.5266195Z INFO    grub_install.sh: Elapsed time (grub_install.sh): 0m2s
2024-09-11T13:33:25.5754372Z INFO    build_image: Generating flatcar_production_image_pcr_policy.zip
2024-09-11T13:33:25.8790423Z INFO    build_image: Writing flatcar_production_image_contents.txt
2024-09-11T13:33:26.7383193Z INFO    build_image: Writing flatcar_production_image_contents_wtd.txt
2024-09-11T13:33:26.9908326Z cpio: premature end of file
2024-09-11T13:33:26.9914934Z rmdir: failed to remove '/home/sdk/trunk/src/scripts/artifacts/amd64-usr/developer-4089.0.0+nightly-20240910-2100-12-g6dd0a5b3f7-a1/tmp_initrd_contents/rootfs-0': Directory not empty
2024-09-11T13:33:27.0063435Z ERROR   build_image: script called: build_image '--board=amd64-usr' '--group=developer' '--output_root=/home/sdk/trunk/src/scripts/artifacts' 'prodtar' 'container' 'sysext'
2024-09-11T13:33:27.0069179Z ERROR   build_image: Backtrace:  (most recent call is last)
2024-09-11T13:33:27.0086074Z ERROR   build_image:   file build_image, line 173, called: create_prod_image 'flatcar_production_image.bin' 'base' 'developer' 'coreos-base/coreos' 'containerd-flatcar:app-containers/containerd,docker-flatcar:app-containers/docker&app-containers/docker-cli&app-containers/docker-buildx'
2024-09-11T13:33:27.0103055Z ERROR   build_image:   file prod_image_util.sh, line 169, called: finish_image 'flatcar_production_image.bin' 'base' '/home/sdk/trunk/src/scripts/artifacts/amd64-usr/developer-4089.0.0+nightly-20240910-2100-12-g6dd0a5b3f7-a1/rootfs' 'flatcar_production_image_contents.txt' 'flatcar_production_image_contents_wtd.txt' 'flatcar_production_image.vmlinuz' 'flatcar_production_image_pcr_policy.zip' 'flatcar_production_image.grub' 'flatcar_production_image.shim' 'flatcar_production_image_kernel_config.txt' 'flatcar_production_image_initrd_contents.txt' 'flatcar_production_image_initrd_contents_wtd.txt' 'flatcar_production_image_disk_usage.txt'
2024-09-11T13:33:27.0112878Z ERROR   build_image:   file build_image_util.sh, line 903, called: die_err_trap '"${BUILD_LIBRARY_DIR}/extract-initramfs-from-vmlinuz.sh" "${root_fs_dir}/boot/flatcar/vmlinuz-a" "${BUILD_DIR}/tmp_initrd_contents"' '1'
2024-09-11T13:33:27.0118365Z ERROR   build_image: 
2024-09-11T13:33:27.0124923Z ERROR   build_image: Command failed:
2024-09-11T13:33:27.0132629Z ERROR   build_image:   Command '"${BUILD_LIBRARY_DIR}/extract-initramfs-from-vmlinuz.sh" "${root_fs_dir}/boot/flatcar/vmlinuz-a" "${BUILD_DIR}/tmp_initrd_contents"' exited with nonzero code: 1

@ader1990
Copy link
Contributor Author

Successful build for the AMD64:

 uname -a
Linux localhost 6.10.9-flatcar #1 SMP PREEMPT_DYNAMIC Wed Sep 11 17:33:15 -00 2024 x86_64 Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz GenuineIntel GNU/Linux
root@localhost ~ # cat /etc/os-release
NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=4089.0.0+nightly-20240910-2100-14-g5595c96aa4
VERSION_ID=4089.0.0
BUILD_ID=nightly-20240910-2100-14-g5595c96aa4
SYSEXT_LEVEL=1.0
PRETTY_NAME="Flatcar Container Linux by Kinvolk 4089.0.0+nightly-20240910-2100-14-g5595c96aa4 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar.org/"
BUG_REPORT_URL="https://issues.flatcar.org"
FLATCAR_BOARD="amd64-usr"
CPE_NAME="cpe:2.3:o:flatcar-linux:flatcar_linux:4089.0.0+nightly-20240910-2100-14-g5595c96aa4:*:*:*:*:*:*:*"

@ader1990
Copy link
Contributor Author

The bpf amd64 bpf.execsnoop mantle test should be fixed by a new image of iovisor/bcc iovisor/bcc@5d2ef17. I triggered an image update https://github.com/flatcar/mantle/actions/runs/10832754186/job/30057878029.

@ader1990 ader1990 force-pushed the ader1990/linux_kernel_6_10 branch from 5595c96 to 4ad039e Compare September 17, 2024 13:27
@ader1990
Copy link
Contributor Author

@t-lo I observed that from Linux kernel 6.10, there is a change in name of a hyper-v daemon binary - see torvalds/linux@82b0945. Should we leave the same systemd unit name though?

I wonder how the https://github.com/microsoft/azurelinux will be doing it (have not seen yet any patch).

I am oscillating between this 4ad039e vs changing the name in all places.

@t-lo
Copy link
Member

t-lo commented Sep 17, 2024

@t-lo I observed that from Linux kernel 6.10, there is a change in name of a hyper-v daemon binary - see torvalds/linux@82b0945. Should we leave the same systemd unit name though?

I wonder how the https://github.com/microsoft/azurelinux will be doing it (have not seen yet any patch).

I am oscillating between this 4ad039e vs changing the name in all places.

I think we should rename the systemd service to prevent confusion down the road.

@ader1990
Copy link
Contributor Author

@t-lo I observed that from Linux kernel 6.10, there is a change in name of a hyper-v daemon binary - see torvalds/linux@82b0945. Should we leave the same systemd unit name though?
I wonder how the https://github.com/microsoft/azurelinux will be doing it (have not seen yet any patch).
I am oscillating between this 4ad039e vs changing the name in all places.

I think we should rename the systemd service to prevent confusion down the road.

The thing is that the binaries do the same thing / have the same interface, but just internally have a different implementation aka uio_hv_generic. The weird part is that the old implementation is still present, but has build disabled.

I will add a new service definition (as it also has a different device path trigger) for the new version, to keep things separate.

@ader1990
Copy link
Contributor Author

The /boot partition is very close to a critical level, 49% already used, leaving around 1.5MB free to use:

/dev/vda1          129039    62852     66187  49% /boot

@ader1990
Copy link
Contributor Author

Note: on AMD64 vmlinuz-a, the build_library/extract-initramfs-from-vmlinuz.sh fails due to the fact now that the scripts finds the corrupted CPIO first. Need to do some more debugging on why this issue happens in the first place (what has changed upstream).

@ader1990 ader1990 self-assigned this Sep 20, 2024
@ader1990 ader1990 force-pushed the ader1990/linux_kernel_6_10 branch from 06de6c8 to f62e9ba Compare September 24, 2024 12:55
@ader1990 ader1990 changed the title Upgrade Linux kernel from 6.6 to 6.10 Upgrade Linux kernel from 6.6 to 6.12 Oct 28, 2024
@ader1990
Copy link
Contributor Author

Currently blocked by the open-zfs package, that does not yet support 6.12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants