-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCI Image boot quirks or "The road to FROM fedora: 40
"
#2
Comments
Right now, the following permissions differences have been identified between booting an OCI image after stripping the base commit and an OCI image with a base commit: The following dirs have different permissions: chmod 750 ./usr/etc/audit
chmod 750 ./usr/etc/audit/rules.d
chmod 755 ./usr/etc/bluetooth
chmod 750 ./usr/etc/dhcp
chmod 750 ./usr/etc/firewalld
chmod 700 ./usr/etc/grub.d
chmod 700 ./usr/etc/nftables
chmod 700 ./usr/etc/nftables/osf
chmod 555 ./usr/etc/pki/ca-trust/extracted/pem/directory-hash
chmod 750 ./usr/etc/polkit-1/rules.d
chmod 700 ./usr/etc/ssh/sshd_config.d
chmod 700 ./usr/lib/containers/storage/overlay-images
chmod 700 ./usr/lib/containers/storage/overlay-layers
chmod 700 ./usr/lib/ostree-boot/efi
chmod 700 ./usr/lib/ostree-boot/efi/EFI
chmod 700 ./usr/lib/ostree-boot/efi/EFI/BOOT
chmod 700 ./usr/lib/ostree-boot/efi/EFI/fedora
chmod 700 ./usr/lib/ostree-boot/grub2
chmod 700 ./usr/lib/ostree-boot/grub2/fonts
chmod 750 ./usr/libexec/initscripts/legacy-actions/auditd Which makes systemd panic, and sddm not able to launch The following bins have different capabilities that have been stripped (probably due to ostree-rs-ext' gzip encoding): setcap cap_dac_override,cap_net_admin,cap_net_raw=eip ./usr/bin/dumpcap
setcap cap_sys_nice=ep ./usr/bin/kwin_wayland
setcap cap_setgid=ep ./usr/bin/newgidmap
setcap cap_setuid=ep ./usr/bin/newuidmap
setcap cap_net_bind_service=ep ./usr/bin/rcp
setcap cap_net_bind_service=ep ./usr/bin/rlogin
setcap cap_net_bind_service=ep ./usr/bin/rsh The following dirs lose the polkitid group perm since polkitd is no longer in chgrp $POLKIT_ID ./usr/etc/polkit-1/localauthority
chgrp $POLKIT_ID ./usr/etc/polkit-1/rules.d This causes polkits to not work |
Both rpm-ostree and the bootc PR do not do the following (when the additions are through OCI): Remove rm -rf \
./etc/.pwd.lock \
./etc/passwd- \
./etc/group- \
./etc/shadow- \
./etc/gshadow- \
./etc/subuid- \
./etc/subgid- \
./.dockerenv Update They do not handle Bootc does not merge rpm-ostree stashes around 300mb of data in |
This relates strongly to https://gitlab.com/fedora/bootc/base-images-experimental and https://gitlab.com/fedora/bootc/tracker/-/issues/32 |
There's a whole lot going on in this project (thanks for starting it!)...I think though we are going to need to tease some of these sub-problems apart and tackle them more clearly individually. Especially the:
part. BTW, when I was looking at this one of the just fundamental "sand in the gears" going on here is containers/buildah#5592 - and really the only way to work around that today is to step "outside" and reserialize (or at least fix up) the tar streams generated by podman/docker. (Yes, we should fix that bug) |
Lets use this issue to track quirks inherent to booting an OCI image. Hopefully, when this issue closes it will be possible for someone to boot an image made with
FROM fedora:40
First lets begin with some background about the quirks
Background
Currently, an OSTree based system can only boot an OSTree commit. OSTree commits are essentially a serialization format for a filesystem, such as a tarball, with the benefit of being able to be deduplicated on a file level.
To make that directory bootable and memoryless ("without hysterisis"), the OSTree project contains a variety setup steps, in which e.g., initramfs is generated and placed in
/usr/lib
,/etc
files are moved to/usr/etc
etc.These steps are done using the tool
rpm-ostree
using its image generation backend and can currently only be done exclusively with that tool. In addition, rpm-ostree contains a couple of systemd services that fixup OS quirks (e.g., generating/var
from a location called var factory).Then, the filesystem is wrapped into a commit, and placed into an HTTP2 enabled server, where users can download new system files when an update happens.
While revolutionary, this system had the following disadvantages:
OCI extension
Therefore,
ostree-rs-ext
was developed with a new serialization format, which converts an OSTree commit to an OCI image. This standard embeds the OSTree commit as an OSTree repository with xattr format in the/sysroot/ostree
directory. Then, as the commit is written to the tar stream, the ostree files are hardlinked to the location they would have in the system (e.g., /usr/etc OSTree files are hardlinked to /etc).The benefit of this format is that it makes it possible to run the result as a container and extend it.
A trivial compression format splits this across 64 layers to make it easier to download and make some bandwidth savings possible.
When rpm-ostree receives that image, it first checks if it is a commit that has not been extended. If it is not, it imports it as usual. If has been extended, it imports the OSTree layers as an original "base" commit. The directory permissions are also sourced by the commit, which might and are different in the final container.
Then, for the extension layers, OSTree converts them to small commits on the fly, by using the base commit for SELinux labelling and moving /etc files to /usr/etc. This means that any extensions added over OCI have not been postprocessed and have quirks.
For example, the /etc/passwd file has drift. And since only the base commit is used for SELinux labelling, any package additions with custom SELinux rules break.
And of course, if there is no base commit, rpm-ostree will not load the image.
Bootc
Now, bootc comes along and formalizes the notion of OCI as OS images. Initially, it uses
ostree-rs-ext
to do the unencapsulation. However, soon it will use podman to pull and expand the container, which is then fed to OSTree (containers/bootc#215). This solves the SELinux issues but introduces a set of new ones.The codebase of that PR was referenced when building rechunk and, surprisingly, the resulting image did not boot. Therefore, when that PR merges bootc will stop being able to boot extended images.
Why?
A lot of minor reasons.
Because the OCI container might have wrong permissions in certain systemd dirs which make it fail to boot. Maybe the container has both an
/etc
and/usr/etc
dir, which OSTree does not like at all, but due to the way rpm-ostree is implemented right now it works (/etc
files are transparently merged to/usr/etc
). Maybe the polkitd folder lost the polkitd group and broke. Podman rootless may break becausenewuidmap
has broken capabilities. And so on (see https://github.com/hhd-dev/rechunk/blob/master/1_prune.sh) with even more quirks we do not know about.TLDR
In order for
FROM fedora:40
to be possible, the following need to happen:/usr/etc/polkit-1/rules.d
)Of course, there is still value in using ostree encapsulated commits in a bootc world:
For most users that not developers, it does not make sense for them to have to eat the update cost for distro maintainer DX, especially when rechunk can fixup the image in 7 min.
Tagging @cgwalters as the discussion with containers/bootc#215 affects bootc
The text was updated successfully, but these errors were encountered: