Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Adapt to latest snapper #2697

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

davidcassany
Copy link
Collaborator

@davidcassany davidcassany commented Dec 18, 2024

This PR is a reaction to the latest snapper features to better support image building and installation processes, see openSUSE/snapper#944

The fundamental idea is to prevent KIWI from running logic and configurations which are silently coupled to snapper internals. KIWI does not require snapper to build images in which the root is a snapshot, however this assumes snapper within the image to be meaningful. As a consequence KIWI manually creates the .snapshots/1/snapshot subvolume structure and its .snapshots/1/info.xml metadata and it also creates the root snapper configuration. All these are snapper specific details being managed without using snapper assuming there will be snapper in the image to actually be meaningful.

With the latest changes in snapper on the mentioned issue snapper provides an installation-helper utility to help us creating the first empty snapshot (so we can easily dump in the root-tree) and also generate the appropriate setup. The change frees KIWI to know about snapper internals and it explicitly couples the btrfs_root_is_snapshot flag to snapper usage. The counter part is requiring snapper as build time dependency.

This PR is not complete as it does not address:

  • backward compatibility
  • snapper requirement in spec
  • unit tests

However this is enough for rough testing and discuss weather the change is worth or not. The code in this PR is just droping some btrfs calls and manual snapper configuration setup in favor of the installation-helper.

@@ -438,6 +426,7 @@ def set_property_readonly_root(self):
root_is_readonly_snapshot = \
self.custom_args['root_is_readonly_snapshot']
if root_is_snapshot and root_is_readonly_snapshot:
# TODO we could call snapper modify here instead
sync_target = self.get_mountpoint()
Command.run(
['btrfs', 'property', 'set', sync_target, 'ro', 'true']
Copy link
Collaborator Author

@davidcassany davidcassany Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we could run something like snapper --no-dbus --root <image_root> modify --read-only 1 if we require snapper as part of the build environment.

@davidcassany davidcassany force-pushed the adapt_to_latest_snapper branch from 45e162e to 75640d1 Compare December 19, 2024 12:05
@schaefi
Copy link
Collaborator

schaefi commented Dec 20, 2024

and it explicitly couples the btrfs_root_is_snapshot flag to snapper usage.

I really like this. My only concern is that we would still need the "old" code somehow for backward compatibility. Do you know a stable way to identify that we need to run the fallback code path other than "try with snapper first, if it fails fallback" ?

The counter part is requiring snapper as build time dependency

I see the current code runs /usr/lib/snapper/installation-helper and also snapper from the build host. Do you think it would be possible to call the tools from the new root (chroot) instead. This way we could also say we explicitly run the fallback code if no snapper tool was found in the root tree. Adding the needed tools then becomes a requirement of the image description and a user needs to add <package name="snapper"/> or similar to complete the needed tooling. If we are pulled into the fallback code we can show this as a warning.

Overall I really like this initiative and we should do the proposed change. Thanks much

@davidcassany
Copy link
Collaborator Author

davidcassany commented Dec 20, 2024

and it explicitly couples the btrfs_root_is_snapshot flag to snapper usage.

I really like this. My only concern is that we would still need the "old" code somehow for backward compatibility. Do you know a stable way to identify that we need to run the fallback code path other than "try with snapper first, if it fails fallback" ?

Well, for the time being this new snapper logic is merged upstream but not yet included in a specific release. My idea was something like checking the snapper version once a new release including the change is available at the very least in TW. I don't think there is a better way to handle backward compatibility other than keeping around the old logic in case we detect the given snapper does not include the new logic. Here I dropped the code in this PR to make explicit which are the bits that could be eventually removed at some point, just to make the change obvious for discussion.

I see the current code runs /usr/lib/snapper/installation-helper and also snapper from the build host. Do you think it would be possible to call the tools from the new root (chroot) instead.

That is a good point and it should be possible, then btrfs_root_is_snapshot would require snapper in the image to actually build, which is good IMHO (note this concept could be extended to almost any tool used after system create step... 🤔 ). The implementation might be a bit more invasive as this chroot env I am not sure how complex would it be to bring it up there. In fact if we do something like that I'd try to generalize a bit the concept of chrooting to build/image-root/ to actually manipulate disks from the chroot, IIRC something like that does not happen in anywhere in the build process. Probably this could be used on other contexts in the future too. I'll let this idea mature in my head during the Christmas break.

Overall I really like this initiative and we should do the proposed change. Thanks much

Great, count on me to bring this to a mergeable state on early January 😉

@Conan-Kudo
Copy link
Member

I don't like this. There are a multitude of snapshot management tools for Btrfs. Snapper just happens to be the oldest one. I would actually prefer if we're going to have a snapper-specific thing that we explicitly create a snapper attribute instead of making something that looks tool-agnostic be snapper-specific.

@schaefi
Copy link
Collaborator

schaefi commented Dec 31, 2024

I don't like this. There are a multitude of snapshot management tools for Btrfs. Snapper just happens to be the oldest one. I would actually prefer if we're going to have a snapper-specific thing that we explicitly create a snapper attribute instead of making something that looks tool-agnostic be snapper-specific.

ok so we have the following in our schema

k.type.btrfs_root_is_snapshot.attribute =
        ## Tell kiwi to install the system into a btrfs snapshot
        ## The snapshot layout is compatible with the snapper management
        ## toolkit. By default no snapshots are used

This attribute is responsible for creating the snapshot layout suitable for snapper and also suitable for SUSE. It's been used on SUSE btrfs based images only at the moment and I suggest that we rename this attribute e.g btrfs_root_is_snapper_snapshot. Along with the change update the documentation and also mention that SUSE is the primary user of it and that using it will require snapper.

We would need an XSLT stylesheet to auto-rename the attribute and a schema version bump but I would say it's worth it regarding the snapper dependency that is been introduced here.

In any way it's a good idea from David to get rid of the "clone code" in kiwi to produce a snapper compatible layout

@davidcassany
Copy link
Collaborator Author

davidcassany commented Jan 8, 2025

@schaefi @Conan-Kudo I agree with you, all snapshots management in KIWI is already highly coupled with snapper and I agree this should be explicit. I also doubt we can built a generic concept here as snapper snapshots is something we must support (it's being used and it has been there for years) and I doubt we can consider it a generic approach, meaning the same snapper snapshots being managed by a different tool.

So to wrap up, and according to the discussion, to push this forward this is what we would need:

  • A new snapper release (required changes already present in TW since snapper 0.12.1)
  • Adapt the XML to rename btrfs_root_is_snapshot attribute to explicitly related it to snapper (note this is an independent change)
  • Refactor the PoC of this PR to actually not require snapper in the host (use chroot), so no changes in spec would be required. (@davidcassany: I am on it )
  • Keep backward compatibility so keep old logic with path selector based on the detected version of snapper in the root-tree.
  • Add and adapt tests

Does it make sense to you?

@schaefi
Copy link
Collaborator

schaefi commented Jan 13, 2025

@davidcassany makes perfect sense to me 👍 I volunteer to implement

  • Adapt the XML to rename btrfs_root_is_snapshot attribute to explicitly related it to snapper (note this is an independent change)

schaefi added a commit that referenced this pull request Jan 13, 2025
Rename btrfs_root_is_snapshot to btrfs_root_is_snapper_snapshot.
This happens in preparation for the changes suggested in #2697
where we want to get rid of snapper specific btrfs code which
will be available in snapper natively soon. To make sure a btrfs
layout specific to snapper(and SUSE), the implicitly used attribute
named btrfs_root_is_snapshot now becomes explicit and its new
name will indicate that snapper sits behind it. Along with the
rename a XSLT stylesheet to automatically convert the old name
into the new name for schema v8.3 will be performed.
@davidcassany
Copy link
Collaborator Author

@schaefi I am adapting the proof of concept to make use of chroots instead of calling the snapper, I will also paste here the config.XML I use for testing.

After that I'd like to acknowledge here in general terms that the implementation I am doing is actually matching expectations, basically before getting into the mess of adding backward compatibility and reworking the tests I'd like to make sure the implementation itself does not require a significant rework.

Chroot manager class can be used to setup
a chroot enviroment including an arbitrary
list of bind mounts. The provided paths to bind
are mounted in the chroot to the same actual root
path they are bound from.

This class is useful to setup a chroot envirment
based on the root-tree created on prepare step.

Signed-off-by: David Cassany <[email protected]>
Signed-off-by: David Cassany <[email protected]>
@davidcassany davidcassany force-pushed the adapt_to_latest_snapper branch from 75640d1 to 795f0bf Compare January 17, 2025 12:09
@davidcassany
Copy link
Collaborator Author

I adapted the PoC so that snapper is only called from the image root-tree, to do that chrooting at the <target-dir>/build/image-root/ is required together with access to the btrfs partition mountpoints. Since this requires either new mountpoints for subvolumes under the root tree or either bind mounts to the mounted subvolumes to the actual root three I though having a chroot context manager would be handy and eventually reusable. This is why a ChrootManager class is created. It could allow to do something like:

which ChrootManager(root_dir, binds=['/sys', '/proc', '/dev']) as chroot:
    chroot.run(['snapper', '--no-dbus', ...])

in __enter__ is mounts the bind mounts and in __exit__ they are umounted. I realized this does not follow exactly the same critera of other context managers in KIWI, as it actually runs some setup logic in __enter__. Other KIWI context managers don't do that and I suppose there is a reason for that, is it due to having to handle exception in __enter__?

I am not fully convinced about the ChrootManager as it feels and looks similar to ImageSystem class, however ImageSystem assumes there is a fully functional root-tree on the device which is not the case when the involved btrfs and snapper operations of this PR are called. Probably we could simple have helper functions inside VolumeManagerBtrfs, but this prevents any possibility of later reuse. All in all I kept the ChrootManager class as it is small and simple should be easy to rework of someone has better ideas or concerns.,

The reason I think reuse might be interesting because I think that using the image root-tree as a chroot environment to operate over the devices is a powerful concept that could be potentially extended to other use cases. I wonder if LVM tools and filesystem tools couldn't also be leveraged from the image tree as well, preventing its requirement in the host itself. Wouldn't it be an option? Obviously this would be a significant refactor probably something we don't want, but I think it is worth a separate discussion.

@davidcassany
Copy link
Collaborator Author

To test the current PR I used a simplified version of openSUSE-MicroOS image, I only set a single profile:

<?xml version="1.0" encoding="utf-8"?>
<image schemaversion="7.4" name="openSUSE-MicroOS" displayname="openSUSE MicroOS">
    <description type="system">
        <author>openSUSE Project</author>
        <contact>[email protected]</contact>
        <specification>openSUSE MicroOS</specification>
    </description>
    <profiles>
        <profile name="kvm-and-xen" description="kvm-and-xen" arch="x86_64" import="true"/>
    </profiles>
     <preferences profiles="kvm-and-xen" arch="x86_64">
        <version>16.0.0</version>
        <packagemanager>zypper</packagemanager>
        <bootloader-theme>openSUSE</bootloader-theme>
        <rpm-excludedocs>true</rpm-excludedocs>
        <locale>en_US</locale>
        <type image="oem" filesystem="btrfs" format="qcow2" firmware="uefi" bootpartition="false"
            bootkernel="custom" devicepersistency="by-uuid" btrfs_root_is_snapshot="true"
            btrfs_root_is_readonly_snapshot="true" trfs_quota_groups="true">
            <bootloader name="grub2" console="gfxterm" />
            <systemdisk>
                <volume name="home"/>
                <volume name="root"/>
                <volume name="opt"/>
                <volume name="srv"/>
                <volume name="boot/grub2/i386-pc"/>
                <volume name="boot/grub2/x86_64-efi" mountpoint="boot/grub2/x86_64-efi"/>
                <volume name="boot/writable"/>
                <volume name="usr/local"/>
                <volume name="var" copy_on_write="false"/>
            </systemdisk>
            <size unit="G">20</size>
        </type>
    </preferences>
    <repository type="rpm-md" alias="repo-oss">
        <source path="http://download.opensuse.org/tumbleweed/repo/oss/"/>
    </repository>
    <packages type="image">
        <package name="live-add-yast-repos"/>
        <package name="patterns-microos-basesystem"/>
        <package name="patterns-microos-base-zypper"/>
        <package name="patterns-microos-defaults"/>
        <package name="patterns-microos-selinux"/>
        <package name="kernel-default"/>
        <package name="dracut-kiwi-oem-repart"/>
        <package name="device-mapper"/>
        <package name="cryptsetup"/>
    </packages>
    <packages type="image" profiles="kvm-and-xen">
        <package name="ignition-dracut"/>
        <package name="combustion &gt;= 1.2"/> <!-- New firstboot mechanism -->
        <package name="jeos-firstboot"/>
        <package name="growpart-generator"/>
        <package name="patterns-base-bootloader"/>
    </packages>
    <packages type="image" profiles="kvm-and-xen">
        <!-- KVM and Xen specific packages -->
        <package name="xen-tools-domU" arch="x86_64"/>
        <package name="qemu-guest-agent"/>
    </packages>
    <packages type="bootstrap">
        <package name="coreutils"/>
        <package name="gawk"/>
        <package name="gzip"/>
        <package name="hostname"/>
        <package name="openssl"/>
        <package name="filesystem"/>
        <package name="glibc-locale-base"/>
        <package name="ca-certificates-mozilla"/>
        <package name="MicroOS-release-dvd"/>
    </packages>
</image>

@schaefi
Copy link
Collaborator

schaefi commented Jan 18, 2025

@davidcassany Other KIWI context managers don't do that and I suppose there is a reason for that, is it due to having to handle exception in enter?

Remember the move into context managers was a refactoring from existing classes. Therefore the required enter and exit methods where added to the existing implementation to not break the interface. Also if you design your context managers in this way

class Foo:
    def __init__(self):
        print('some foo')

    def __enter__(self):
        return self

    def cleanup(self):
        print('done')

    def __exit__(self, exc_type, exc_value, traceback) -> None:
        self.cleanup()

You can create an object instance that runs the constructor code in both ways the same

with Foo() as foo:
    pass

foo = Foo()
foo.cleanup()

In your case this does not apply as you are introducing a new class and you can decide how people should use it. Context Manager only is fine with me

@schaefi
Copy link
Collaborator

schaefi commented Jan 18, 2025

I am not fully convinced about the ChrootManager as it feels and looks similar to ImageSystem

You are right when you say ImageSystem expects a full system to be mountable. It was the reason why I added that class in the past. I like the additional ChrootManager class to invoke commands chrooted. I was thinking instead of a new class you could also turn the Command class into a context manager such that we can call:

with Command(root_dir, binds=['/sys', '/proc', '/dev']) as chroot:
    chroot.run(['snapper', '--no-dbus', ...])

Thoughts ?

@schaefi
Copy link
Collaborator

schaefi commented Jan 18, 2025

I wonder if LVM tools and filesystem tools couldn't also be leveraged from the image tree as well, preventing its requirement in the host itself. Wouldn't it be an option?

yes it would be an option to place more commands that we currently call from the host as chroot calls. I just can't estimate after effects of this move. I'm open to do this one after the other in separate PRs and we can see if the integration tests that makes use of the commands that we moved still works. Would that be a plan ?

@schaefi
Copy link
Collaborator

schaefi commented Jan 18, 2025

@schaefi I am adapting the proof of concept to make use of chroots instead of calling the snapper, I will also paste here the config.XML I use for testing.

After that I'd like to acknowledge here in general terms that the implementation I am doing is actually matching expectations, basically before getting into the mess of adding backward compatibility and reworking the tests I'd like to make sure the implementation itself does not require a significant rework.

Implementation wise this is exactly what I had expected. A lot code path removed and replaced by snapper calls. Thanks much for doing this. Regarding ChrootManager, see my idea from the former comment. iirc all code changed is also only active if btrfs_root_is_snapper_snapshot is set. Given you plan a backward compatibility there should also be no problem for systems that has no snapper or an older version of it.

Overall you have my vote on this. We should wait for @Conan-Kudo though

Copy link
Member

@Conan-Kudo Conan-Kudo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes about implementation that should be resolved.

Comment on lines +414 to +417
if self.custom_args['root_is_snapper_snapshot']:
root_prefix = os.path.join(
self.mountpoint,
f'{self.root_volume_name}/.snapshots/1/snapshot'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have snapper tell us what this is? This is still an assumption of how snapper works rather than snapper telling us stuff.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed this is a though one and I fully see your point. However I am not sure how this could be actually solved, all this refactor is based on the two step installation helper introduced in snapper 0.12.1 (see openSUSE/snapper#944). From a snapper PoV this could be solved with docs, I mean, this is just constant in the installation helper. I´d say it always creates a subvolume with ID=1 and the snapshots structure is also a constant. Gonna have a look if there is anything from snapper to provide the path for a given snapshot or something similar.

Comment on lines +512 to +514
# snapper requires an extra parent qgroup to operate with quotas
Command.run(
['btrfs', 'qgroup', 'create', '1/0', self.mountpoint]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is again something that snapper should do rather than us.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so sure about this, I don´t know enough about this use case for a proper opinion, but at a glance I am not convinced this is on snapper responsibilities. Snapper only manages subvolumes in a CRUD fashion and it expects a btrfs filesystem there ready to be used. We are just telling to snapper which quota group to use on the existing filesystem.

@Conan-Kudo
Copy link
Member

@schaefi I am adapting the proof of concept to make use of chroots instead of calling the snapper, I will also paste here the config.XML I use for testing.
After that I'd like to acknowledge here in general terms that the implementation I am doing is actually matching expectations, basically before getting into the mess of adding backward compatibility and reworking the tests I'd like to make sure the implementation itself does not require a significant rework.

Implementation wise this is exactly what I had expected. A lot code path removed and replaced by snapper calls. Thanks much for doing this. Regarding ChrootManager, see my idea from the former comment. iirc all code changed is also only active if btrfs_root_is_snapper_snapshot is set. Given you plan a backward compatibility there should also be no problem for systems that has no snapper or an older version of it.

Overall you have my vote on this. We should wait for @Conan-Kudo though

This makes sense to me too. The key thing though is that we shouldn't be doing any configuration ourselves of the volume if we expect snapper to manage it. We need to go fully to "we don't know wtf is going on" in this case.

@davidcassany
Copy link
Collaborator Author

yes it would be an option to place more commands that we currently call from the host as chroot calls. I just can't estimate after effects of this move. I'm open to do this one after the other in separate PRs and we can see if the integration tests that makes use of the commands that we moved still works. Would that be a plan ?

Sure, something like that should be explored in small steps. For instance, lets try to run LVM support this way and focus only on LVM tools and see how good or painful it turns to be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants