-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making Cincinnati updates work with ostree containers #1263
Comments
(Originally posted by @cgwalters in #1219 (comment)) But regarding zincati specifically: half-baked strawman: embed barriers in the container imageWe encode "epochs"/barriers in like this:
We embed metadata in the container images that says that This is related to ostreedev/ostree#874 |
Another half-baked proposal: Tag-based updatesZincati itself doesn't have much OSTree knowledge. Its main output is telling rpm-ostree which version to deploy. This model could carry over in the CoreOS layering world with the right semantics in place:
Rolling out out-of-cycle changesWhen users modify their layered content, they might not want to wait until the next FCOS release to roll it out. What I don't want is Zincati learning to speak to container registries. Instead, we can have it periodically ask rpm-ostree to check if updates to that tag are available. There's a similarity here with client-side RPM layering from repos: new RPM versions won't actually be updated until the next release or if one explicitly does |
I've updated this section of the proposal based on discussions OOB with @lucab. |
I'm not so sure. I don't feel like this is super heavyweight IMO. I guess it gets more complicated if there is authentication involved. |
What's cool is: I specifically split out Rust bindings to use skopeo into a dependency of ostree to support use cases exactly like this! (That project is also already being used by at least one project not published on crates.io. (As of recently there is apparently now https://github.com/confidential-containers/image-rs/blob/main/docs/design.md which explicitly cites the proxy code) |
This all said - storing things in registries today that aren't actually runnable containers is awkward. But, OCI Artifacts is coming to make that better. |
The reason I said this was that I didn't want yet another thing pulling in a container stack, but |
There was some out of band discussion on this as it relates to Fedora IoT and the idea of supporting Cincinnati there too; the more I think about this the more I feel like it makes sense to entirely fold the core functionality of zincati into rpm-ostree at some point. (The whole thing we did with "update drivers" is really complex, and while I think it's still logically something we want for the general complex case, it'd be a lot more obvious from a UX point of view to have e.g. |
This is part of coreos/fedora-coreos-tracker#1263 If we're booted into a container image, then instead of looking for the special `fedora-coreos.stream` ostree commit metadata, we can do the much more obvious and natural thing of looking at the container image tag.
Some work on this in coreos/zincati#878 |
This is part of coreos/fedora-coreos-tracker#1263 If we're booted into a container image, then instead of looking for the special `fedora-coreos.stream` ostree commit metadata, we can do the much more obvious and natural thing of looking at the container image tag.
This is part of coreos/fedora-coreos-tracker#1263 If we're booted into a container image, then instead of looking for the special `fedora-coreos.stream` ostree commit metadata, we can do the much more obvious and natural thing of looking at the container image tag.
This is part of coreos/fedora-coreos-tracker#1263 We don't yet have an official stance on how zincati and custom container images interact. Today, zincati just crash loops. This changes things so that we gracefully exit if we detect the booted system is using a container image origin. (The code here isn't quite as clean as it could be; calling `std::process::exit()` in the middle of the call chain isn't elegant but doing better would require plumbing through an `Option<T>` through many layers)
From discussion in coreos/zincati#878 related to coreos/fedora-coreos-tracker#1263
This is part of coreos/fedora-coreos-tracker#1263 We don't yet have an official stance on how zincati and custom container images interact. Today, zincati just crash loops. This changes things so that we exit (but still with an error) if we detect the booted system is using a container image origin. One nicer thing here is that the unit status is also updated, e.g. `systemctl status zincati` will show: `Status: "Automatic updates disabled; booted into container image ..."`
From discussion in coreos/zincati#878 related to coreos/fedora-coreos-tracker#1263
From discussion in coreos/zincati#878 related to coreos/fedora-coreos-tracker#1263
I think this currently blocks on #1367 |
Re things the graph gives us, I'd split the second one and add one more: 2a. Scheduled rollouts over a defined time interval. Re dropping the barrier releases, I'm not so concerned about the runtime cost of carrying upgrade code (it's usually a shell script, with a unit that can check for a stamp file), but I do think regressions are a concern. We'd need to get better at carrying tests for upgrading from a specified older release. Key rotation seems harder to solve though. |
2a/2b can also be done by having the registry server itself do this, right? (Now, an interesting topic here is whether we'd want to somehow apply the same policies to clients fetching it as a container image via podman/docker/kube and not to boot directly)
|
The redirect approach sounds interesting. If we want to keep the current wariness stuff (which is how the phasing happens), we'd have to somehow have the request include a stable UUID, e.g. as an HTTP header.
The ostree EOL metadata key addresses the case where you know at build time that it's going to be the last commit in the stream. Deadend releases are usually understood to be deadends after the fact. Unless you mean including information about past deadend releases in the metadata of new images we push on that stream, until the deadend releases go EOL. It's a bit of a hack, but nice in its simplicity (and anyway, deadend releases should be quite rare, so I don't expect that metadata to grow out of control). I guess another approach is keeping it as a separate OCI artifact. |
Yes, this would require a bit of work in the containers/image (and proxy) stack to request adding a header, but that seems straightforward.
Right, we can push a new image that just changes the manifest, but leaves all the blobs the same. (Right now, the bootc stack actually will reboot in the metadata-only case, but we can optimize that...) |
That's not quite what I meant. :) I mean putting information about old deadends into new images and carrying it for a while. Changing manifests we've already pushed for the same version doesn't feel right. (Edit: well... I don't know, maybe it's called for given that deadends are exceptional events. But it feels funny changing the digest of an existing image.) |
The redirection server could be generic over derived images too: ...also this reinvents orchestration a bit, which doesn't feel great. In that sense, Zincati/Cincinnati is closer to the existing k8s model, where there's an external orchestrator deciding what to pull. The label approach may be easier to deploy though.
I thought I remembered that at one point the FCOS Cincinnati handshake was changed to avoid sending a client UUID, since we wanted to avoid uniquely identifying the client. Instead, either the conditional update would be encoded into the graph so that the wariness could be applied client-side, or the client would pick a wariness for each rollout and send it to the server. Looking at the code, apparently that never happened in the default case, and we are indeed sending a UUID to the server. Perhaps we should avoid perpetuating that design, though, and the client should send a wariness value instead of a UUID. While a sufficiently fine-grained wariness would uniquely identify the client for the period between successive updates, it wouldn't be a long-term identifier. |
The reference for the existing protocol is https://coreos.github.io/zincati/development/cincinnati/protocol/#graph-api, while Zincati configuration details are at https://coreos.github.io/zincati/usage/agent-identity/#identity-configuration. On the two points above about UUIDs and wariness:
|
Fedora CoreOS is not yet using containers by default for updates; xref coreos/fedora-coreos-tracker#1263 etc. Consequently, when one boots a FCOS system and wants to rebase to a custom image, one ends up downloading the entire image, including the parts of FCOS that you already have. This changes things so that when we generate disk images by default, we write the *layer refs* of the component parts - but we delete the "merged" container image ref. The semantics here will be: - Only a tiny amount of additional data used by default; the layer refs are just metadata, the bulk of the data still lives in regular file content. - When a FCOS system auto-updates to its by-default usage of an ostree commit, the unused layer refs will be garbage collected. - But, as noted above when rebasing to a container image instead, if the target container image reuses some of those layers (as we expect when rebasing FCOS to a FCOS-derived container) then we don't need to redownload them - we only download what the user provided. Hence, this significantly improves rebasing to container images, with basically no downsides. The alternative code path to actually deploy *as a container* remains off by default. When that is enabled, `rpm-ostree upgrade` fetches a container by default, which is a distinct thing.
Fedora CoreOS is not yet using containers by default for updates; xref coreos/fedora-coreos-tracker#1263 etc. Consequently, when one boots a FCOS system and wants to rebase to a custom image, one ends up downloading the entire image, including the parts of FCOS that you already have. This changes things so that when we generate disk images by default, we write the *layer refs* of the component parts - but we delete the "merged" container image ref. The semantics here will be: - Only a tiny amount of additional data used by default; the layer refs are just metadata, the bulk of the data still lives in regular file content. - When a FCOS system auto-updates to its by-default usage of an ostree commit, the unused layer refs will be garbage collected. - But, as noted above when rebasing to a container image instead, if the target container image reuses some of those layers (as we expect when rebasing FCOS to a FCOS-derived container) then we don't need to redownload them - we only download what the user provided. Hence, this significantly improves rebasing to container images, with basically no downsides. The alternative code path to actually deploy *as a container* remains off by default. When that is enabled, `rpm-ostree upgrade` fetches a container by default, which is a distinct thing.
I re-read this thread and I still find myself somewhat unconvinced that we need to require an update graph instead of just fetching from a container tag. It definitely has value, but comes with a lot of operational complexity as a cost. Of all of the things discussed here, I think we need to take a decision on whether or not we try to require barriers in the future. My vote is no, we just carry the upgrade code for longer, say a year (two fedora majors). |
Moving the thread re bootloaders from #1485 (comment) To be clear, what I was just trying to say is I found it slightly confusing to have to dig through git history to find the failing systemd unit; if we'd tried to do something without barriers, then the unit would have probably had a "stamp file" approach of e.g. I would agree that it seems hard to solve this particular problem without barriers. That said, there is a whole other new thing we could add here, which is a mechanism to pull and execute a container image before trying to apply an OS update. That would align with what we're doing effectively in OCP with the MCO, and be an extremely flexible and powerful escape hatch. Basically zincati (or maybe rpm-ostree) would do e.g. |
Right. We sometimes use the stamp file approach and don't super aggressively remove the unit until the next barrier is created (i.e. when we put in migration code we don't always have to do a barrier at the same time like we had to here).
Thanks. This problem was super tricky and we were lucky we had barriers and
I've thought about this problem before and brought it up. In my mind it would just be extra code we shipped in the OSTree commit we build today that would know how to handle/do migrations depending on different factors. i.e. the OSTree gets downloaded (from a container registry or OSTree repo, doesn't matter) and special migration code gets extracted from it and run before. This code could choose to block the upgrade or allow it to continue, etc. I don't really see why this would need to be a separate container image (just seems like more work IMO). |
Pre-upgrade logic could have different dependencies than the OS, for one. And we wouldn't pay the storage cost of carrying them on disk after completion. |
Right, in which case we could just have the migration code call |
OK, moving this to ostreedev/ostree#2855 and I think with that, we can stop requiring barriers in many cases. That could use some further analysis, but going through a few of them (e.g. the recent bootloader one, or the iptables one) I am pretty sure it'd be viable. |
I'm really not a fan of pre-upgrade code execution, but I agree it's a way out of this. Between hooks and barriers, I'd indeed prefer the former. If we do this, I'd like to see tight policies and maintenance around what we do in there. (E.g. we can say we drop workarounds there after X months.) I'd agree with @dustymabe re. container runtimes. I think all our migration code so far has been not too complex scripts. It's not ideal, but it also has minimal impact on the node state, and dependency concerns are much less of an issue. |
My inclination BTW is to try to drop the rollouts and wariness etc. and keep it super simple - the client tracks a container image tag, fetched once a day by default. Anyone who wants to do anything more than that (replicating something like current "wariness") can mirror the OS update containers on their own to a registry and update it on their own timeframe and schedule. Which is what they already need to know how to do for application containers! Because there's no zincati/cincinnati for podman/kubelet (or for dnf/RPMs for that matter). |
This is split out of #1219 to discuss specifically how we'll make system updates work on hosts using CoreOS layering. Quoting:
updates to systems. When following a container image in a registry the
user is following whatever is latest. Work still needs to be done to
get back the added value from Zincati, into the CoreOS Layering workflow.
Let's discuss ideas on how to address this gap.
The text was updated successfully, but these errors were encountered: