-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate digest from tarball #895
Comments
But the digest is from the remote image, so it cannot be calculated from the local image (without being saved with it)
You will find that "RepoDigests" is empty. It's a shortcoming of the image model... |
Sorry I must be missing something :) Doesn't the container registry just calculate a hash algorithm on the bytes it receives? After all the digest is identical if I push to multiple CRs. Why can't it be calculated the same way on the client side? Assuming one knows the hash algorithm used by the specific CR one is targeting. |
I meant that you need to push the image to the registry, in order for docker to calculate the digest. Also totally misremembered what we had patched, we still don't save the digest in the local tarball. |
Was referring to moby/moby#32016 The registry hashes compressed layers. |
Can't we grab the code that calculates the digest from the container registry implementation and add it to crane? And compress the layers if necessary. From that post, looks like Bazel does this, so it should be possible :) |
The way I have understood it, is that the Similar to when we download regular files, you have your git commit and git archive and you have your .tar.gz and checksum. And even though they reference the same files, it's not possible to "guess" the checksum without knowing the server etc*. * As described in "pristine-tar", a small delta remains (with compression artifacts like timestamps and another noise) |
I'm starting to think that the We have some users that want to compare "foo:latest" with "foo:latest", and for those we will use i.e. for minikube we might have one image in the cache on the host, and one image stored in the cluster So it would be nice to be able to know if we need to upload/uncompress a new image, or if the old one is OK kubernetes/minikube#10075 |
It's a common/constant source of confusion, maybe because they use the same algorithm (sha256) or something. Like in #627, why it goes faster to look up a value locally than having to download the manifest from the registry. |
Yes, crane could calculate the digest of a tarball that the specific version of crane would produce when pushing it. For this to be useful to you, you would need to make sure that the thing calculating the digest and the thing pushing the image are identical. If you were to calculate the digest with crane, and push with docker, we'd have no guarantees of them being the same. This is an unfortunate property of how most tools produce images in the tarball format, but it is not impossible to work around. Depending on what is producing the tarballs, you could even make this problem just go away entirely. @aelij are these tarballs the output of I would actually be fine with just adding something like:
Not sure about the exact flags, but you get the point. It would be pretty trivial.
Is this something that docker embeds in tarballs? I think it would be reasonable for us to add that, but I hadn't seen it before.
This has not been the case for quite some time; however, there is one pathological case where docker will behave in a way that might make this appear to be true. I've been meaning to write this up as a little micro blog post because I thought it was interesting, but since I haven't done that yet, here is as reasonable a place as any, so you get a rough draft:
|
I was talking about the It calls them Id, RepoTags and RepoDigests. $ docker inspect busybox | head
[
{
"Id": "sha256:a77dce18d0ecb0c1f368e336528ab8054567a9055269c07c0169cba15aec0291",
"RepoTags": [
"busybox:latest"
],
"RepoDigests": [
"busybox@sha256:49dae530fd5fee674a6b0d3da89a380fc93746095e7eca0f1b70188a95fd5d71"
],
"Parent": "", It's not saved to the tarballs, though. (Neither is) [
{
"Config": "a77dce18d0ecb0c1f368e336528ab8054567a9055269c07c0169cba15aec0291.json",
"RepoTags": [
"busybox:latest"
],
"Layers": [
"b8383576921fcf341dc0221e7879d8a807e00b52b3bd22fefb532819109be313/layer.tar"
]
}
] |
We just want something that can be fed into (an imaginary) Since CRI doesn't have the API to load from cached files (sadly), we invented our own abstraction for all three supported runtimes: https://github.com/kubernetes/minikube/blob/v1.16.0/pkg/minikube/cruntime/cruntime.go#L96 // LoadImage loads an image into this runtime
func (r *Docker) LoadImage(path string) error {
klog.Infof("Loading image: %s", path)
c := exec.Command("docker", "load", "-i", path)
if _, err := r.Runner.RunCmd(c); err != nil {
return errors.Wrap(err, "loadimage docker.")
}
return nil
} // LoadImage loads an image into this runtime
func (r *CRIO) LoadImage(path string) error {
klog.Infof("Loading image: %s", path)
c := exec.Command("sudo", "podman", "load", "-i", path)
if _, err := r.Runner.RunCmd(c); err != nil {
return errors.Wrap(err, "crio load image")
}
return nil
} // LoadImage loads an image into this runtime
func (r *Containerd) LoadImage(path string) error {
klog.Infof("Loading image: %s", path)
c := exec.Command("sudo", "ctr", "-n=k8s.io", "images", "import", path)
if _, err := r.Runner.RunCmd(c); err != nil {
return errors.Wrapf(err, "ctr images import")
}
return nil
} We normally use scp (or similar ssh variation of cat) to copy the files. The host is only expected to have |
I'm starting to give up on Docker (and even more so on Podman) as well, but that's a different discussion... Thanks a lot for the detailed explanation, and I hope it also helped the original poster about what It reminds me of the discussions that we had with the "reproducable builds" community, about timestamps etc. I actually thought that we were all using Wonder if this policy has anything to do with the ancient images I've seen ? :-)
i.e. the timestamp has been removed |
Indeed! Bazel does the same thing by overriding the creation timestamp:
|
Support for docker to load OCI image layouts got stalled in moby/moby#33355 and I haven't bothered diving through all the related issues to see what the current status is, but presumably it will eventually work as docker relies more on containerd? Most newer things are starting to support image layouts, so that might be the way forward, eventually. |
@jonjohnsonjr Thanks so much for the detailed explanation! And the PR :) I had not imagined this being so complex.
Yes. Would that work? We've migrated to using crane for pushing images.
Is the tag/registry needed here? Because from what I can tell, the digest does not depend on them. |
Not necessarily, as long as there's only one image in the tarball. The way I structured that PR was nice because I didn't have to change anything about the |
This issue is stale because it has been open for 90 days with no |
/remove-lifecycle stale |
Being pkg/drivers/kic/types.go the source of truth for the version of the container we're using to instantiate our kübernetes cluster in, it should be appropriate to hardcode here the imageId(a.k.a. contentDigest) so that it could be later used as a discriminant to invalidate minikube's cache contentDigest is the most reliable way to address image content: if the image is tampered with after push to a registry, the contentDigest we'd see after pull, would be different than the one hardcoded here. It is also part of the image itself, i.e. part of the tar archive; thus giving us a way to always know if the cache is up to date, even offline. distributionDigest is the most reliable way to determine which image we're looking to pull from a registry; a tag can be detached from an image and recycled, referencing another one, with different content. It is not part of the image itself; it is computed on the image in compressed state.. and since different engines/mechanisms could use different types of compression, this digest is totally unreliable as a way to address content. [*] refs: https://windsock.io/explaining-docker-image-ids/ google/go-containerregistry#895 (comment) https://stackoverflow.com/questions/45533005/why-digests-are-different-depend-on-registry https://blog.aquasec.com/docker-image-tags -- follow links
Being pkg/drivers/kic/types.go the source of truth for the version of the container we're using to instantiate our kübernetes cluster in, the pr should start here.. Initially I thought about hardcoding the contentDigest(a.k.a. imageId) here as well, to then use it to check against the images inside the kicDriver.. It later took another turn(we're retrieving it from tar). Plus a collaborator showed me that it was a bad idea.. maintaining it here would bean bumping it as part of the image build process. The idea is based on the following concepts: .contentDigest is the most reliable way to address image content: if the image is tampered with after push to a registry, the contentDigest we'd see after pull, would be different than the one hardcoded here. It is also part of the image itself, i.e. part of the tar archive; thus giving us a way to always know if the cache is up to date, even offline. .distributionDigest is the most reliable way to determine which image we're looking to pull from a registry; a tag can be detached from an image and recycled, referencing another one, with different content. It is not part of the image itself; it is computed on the image in compressed state.. and since different engines/mechanisms could use different types of compression, this digest is totally unreliable as a way to address content. [*] refs: https://windsock.io/explaining-docker-image-ids/ google/go-containerregistry#895 (comment) https://stackoverflow.com/questions/45533005/why-digests-are-different-depend-on-registry https://blog.aquasec.com/docker-image-tags -- follow links
Being pkg/drivers/kic/types.go the source of truth for the version of the container we're using to instantiate our kübernetes cluster in, the pr should start here.. Initially I thought about hardcoding the contentDigest(a.k.a. imageId) here as well, to then use it to check against the images inside the kicDriver.. It later took another turn(we're retrieving it from tar). Plus a collaborator showed me that it was a bad idea.. maintaining it here would bean bumping it as part of the image build process. The idea is based on the following concepts: .contentDigest is the most reliable way to address image content: if the image is tampered with after push to a registry, the contentDigest we'd see after pull, would be different than the one hardcoded here. It is also part of the image itself, i.e. part of the tar archive; thus giving us a way to always know if the cache is up to date, even offline. .distributionDigest is the most reliable way to determine which image we're looking to pull from a registry; a tag can be detached from an image and recycled, referencing another one, with different content. It is not part of the image itself; it is computed on the image in compressed state.. and since different engines/mechanisms could use different types of compression, this digest is totally unreliable as a way to address content. [*] refs: https://windsock.io/explaining-docker-image-ids/ google/go-containerregistry#895 (comment) https://stackoverflow.com/questions/45533005/why-digests-are-different-depend-on-registry https://blog.aquasec.com/docker-image-tags -- follow links
Our build system produces tarballs and K8s deployment configurations, later to be deployed to multiple environments. We would like to get the image's digest at build time so we could reference it in the configuration, rather than relying on a tag, which is not stable.
Could Crane calculate the digest from a tarball? From what I gather,
crane digest
only works with a remote image.The text was updated successfully, but these errors were encountered: