Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Digest is lost after loading a saved image #22011

Open
liron-l opened this issue Apr 13, 2016 · 19 comments
Open

Digest is lost after loading a saved image #22011

liron-l opened this issue Apr 13, 2016 · 19 comments
Labels
area/distribution containerd-integration Issues and PRs related to containerd integration version/1.10

Comments

@liron-l
Copy link
Contributor

liron-l commented Apr 13, 2016

Output of docker version:

Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:59:07 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:59:07 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.10.3
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 1
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: bridge null host
Kernel Version: 4.2.0-25-generic
Operating System: Ubuntu 15.10
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.06 GiB
Name: liron-srv
ID: 32RZ:CFUB:DHXV:UVCO:CALO:XNZE:7CAJ:YDSR:AHZB:A2QK:46R6:3525
Username: twistlockreader
Registry: https://index.docker.io/v1/

Additional environment details (AWS, VirtualBox, physical, etc.):
physical

Steps to reproduce the issue:

  1. pull with digest docker pull "alpine@sha256:4f2d8bbad359e3e6f23c0498e009aaa3e2f31996cbea7269b78f92ee43647811"
  2. image has digest docker images --digests
  3. Save image docker save alpine@sha256:4f2d8bbad359e3e6f23c0498e009aaa3e2f31996cbea7269b78f92ee43647811 > test.tar
  4. Delete the image docker rmi alpine@sha256:4f2d8bbad359e3e6f23c0498e009aaa3e2f31996cbea7269b78f92ee43647811
  5. Load the new image docker load < test.tar
  6. image has no digest docker images --digests also in docker inspect.
    Describe the results you received:
    Digest disappear (and RepoTags and RepoDigests are empty).

Describe the results you expected:
Repo tags and digests remain the same
Additional information you deem important (e.g. issue happens only occasionally):

@thaJeztah
Copy link
Member

ping @aaronlehmann ptal

@aaronlehmann
Copy link
Contributor

The digest field is only populated for images that are pulled by digest. So loading an image from a tar won't fill it.

I'm not sure the field could be carried through on save/load because of the security implications. A saved image could lie about the manifest's digest, since the manifest is not part of the saved image.

Generally, it's better to use use the image ID to check image provenance. The image ID is a secure hash over the image itself, whereas the RepoDigests is a hash over the manifest received from the registry.

@liron-l
Copy link
Contributor Author

liron-l commented Apr 22, 2016

Thanks for the clarification @aaronlehmann, few follow ups:

  1. Any reason not include the RepoDigests field in regular pulls?
  2. Image ID is not persistent across machines, so I'm not sure how it is usable to verify trust (especially since trust is established via the digest)
  3. Why not include the original manifest as part of the image specification?

@aaronlehmann
Copy link
Contributor

Any reason not include the RepoDigests field in regular pulls?

We probably should, but IIRC there are some user interface issues around these digests that should be cleared up first.

Image ID is not persistent across machines, so I'm not sure how it is usable to verify trust (especially since trust is established via the digest)

If you save and load an image, the ID shouldn't change, so it should be possible to use it in this situation.

Why not include the original manifest as part of the image specification?

The manifest references compressed layers. The layers don't get compressed until docker push (they get compressed on the fly). So the manifest has to be created when an image is pushed.

We've explored compressing layers when they are first created to make it possible for manifests to be persistent like you suggest, but there are two issues with doing this:

  • docker build would become a lot slower, even if you never push the resulting image to a registry.
  • Images would use up to twice as much disk space, because the compressed version of each layer has to be stored as well as the usable format of the layer.

Frankly, it's a tradeoff that we've always struggled with. What you're suggesting would definitely have advantages for extending the content trust chain to subsequent pushes and saves. But the disadvantages I mentioned would probably cause some backlash.

@liron-l
Copy link
Contributor Author

liron-l commented Apr 23, 2016

Thanks for the clarifications @aaronlehmann.

  1. I tested it and Image id is not persistent across docker versions, e.g., if you save an alpine image in Docker 1.10 and load it in Docker 1.11 the ID is different.
  2. Is it possible to include the sha256 of each layer.tar inside the manifest, and also include the original manifest in the image content (layer.tar should be persistent across machines no?).
    This will enable validating the content of local images.
  3. Alternatively, given we have a stored local image, is there a way to produce the digest we get from the registry?
    Thanks.

@aaronlehmann
Copy link
Contributor

I tested it and Image id is not persistent across docker versions, e.g., if you save an alpine image in Docker 1.10 and load it in Docker 1.11 the ID is different.

That's strange. @tonistiigi do you know why this would happen?

Is it possible to include the sha256 of each layer.tar inside the manifest, and also include the original manifest in the image content (layer.tar should be persistent across machines no?). This will enable validating the content of local images.

This is essentially how save/load work with Docker 1.10 and up. There is a manifest.json file that contains the sha256 hashes of the layer tar files. That's why I'm surprised you're seeing the ID change between Docker 1.10 and Docker 1.11.

Alternatively, given we have a stored local image, is there a way to produce the digest we get from the registry?

Unfortunately there isn't, because the registry operates in terms of compressed layers.

@tonistiigi
Copy link
Member

I tested it and Image id is not persistent across docker versions, e.g., if you save an alpine image in Docker 1.10 and load it in Docker 1.11 the ID is different.

That's strange. @tonistiigi do you know why this would happen?

Tested saving alpine latest on v1.10 and loading to empty v1.11. ID was the same.

@liron-l
Copy link
Contributor Author

liron-l commented Apr 26, 2016

I'm sorry @tonistiigi, my machines were running Docker v1.9.1 and Docker v1.11.1 (also reproduced the issue on clean machines).

@tonistiigi
Copy link
Member

@liron-l This is expected. v1.10 introduced content addressability and changed how image IDs are calculated. Only v1.10+ has the ID stability guarantees.

@ashb
Copy link

ashb commented May 26, 2016

This also surprised me at first -- I've started looking into the content adressability changes form 1.10 wanted to check all the IDs myself locally.

A bit of background that might be useful if you don't work closely with the docker codebase:

  • The image id is calculated as the sha2 sum of canonoicalized config JSON (which includes sha256 sums of the uncompressed FS layer.tars that make up the FS too.) This can be recomputed and verified which is why when you do docker load you get the same ID (at least from 1.10 onwards. I can't say about before0
  • The distribution id is the sha256 sum of the manifest. v2.2 of the manifest schema (I didn't look at v2.1) contains similar information in a different format. The key difference though is that this checksum is over the compressed data as stored in the registry and as served to the client. This is so that the client can verify the checksum and expected length before uncompressing any layers.

The kicker is that as other people have mentioned is the the distribution digest sums over compressed tar files which the docker engine doesn't store, and the exact size of it depends on the compression settings used. (And possibly other things like what version of libz the client has?)

It's not as simple as including the compression settings use to generate the compressed tar so that a client could reconstruct things is it?

@codablock
Copy link

Just stumbled across the same problem. We have to use docker save/load to distribute Docker images as we don't have access to the internet on the destination hosts. We also have to make sure that the images on the destination hosts end up being exactly the same that were used while bundling the application. Content addressable images sounded perfect for this, but due to this issue this does not work as we hoped.

@pacoxu
Copy link
Contributor

pacoxu commented Mar 7, 2017

Just a simple workaround.[Only work when the computer can access repo]

write a simple Dockerfile as

FROM <image>:<tag>

and run
docker build .

Then, the digest will be generated.

@thaJeztah Along with docker 17.03.0-ce, service create logic uses the image digest and this issue would be quite important for those who are using save and load command.

@thaJeztah
Copy link
Member

ping @nishanttotla @aaronlehmann (this came up in a discussion recently)

notnoop pushed a commit to hashicorp/nomad that referenced this issue Dec 15, 2018
Using `:latest` tag is typically a cause of pain, as underlying image
changes behavior.  Here, I'm switching to using a point release, and
re-updating the stored tarballs with it.

Sadly, when saving/loading images, the repo digeset is not supported:
moby/moby#22011 ; but using point releases
should mitigate the problem.

The motivation here is that docker tests have some flakiness due to
accidental importing of `busybox:latest` which has `/bin/nc` that no
longer supports `-p 0`:

```
$ docker run -it --rm busybox /bin/nc -l 127.0.0.1 -p 0
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
Digest: sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812
Status: Downloaded newer image for busybox:latest
nc: bad local port '0'
```

Looks like older busybox versions (e.g. `busybox:1.24` do honor `-p 0`
as the test expect, but I would rather update busybox to fix.
notnoop pushed a commit to hashicorp/nomad that referenced this issue Dec 15, 2018
Using `:latest` tag is typically a cause of pain, as underlying image
changes behavior.  Here, I'm switching to using a point release, and
re-updating the stored tarballs with it.

Sadly, when saving/loading images, the repo digeset is not supported:
moby/moby#22011 ; but using point releases
should mitigate the problem.

The motivation here is that docker tests have some flakiness due to
accidental importing of `busybox:latest` which has `/bin/nc` that no
longer supports `-p 0`:

```
$ docker run -it --rm busybox /bin/nc -l 127.0.0.1 -p 0
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
Digest: sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812
Status: Downloaded newer image for busybox:latest
nc: bad local port '0'
```

Looks like older busybox versions (e.g. `busybox:1.24` do honor `-p 0`
as the test expect, but I would rather update busybox to fix.
@macdjord
Copy link

macdjord commented Jun 5, 2021

Is there any progress on this? It's blocking our workflow. We're trying to develop an offline installer for our app, but some of the images we use are specified by digest.

@macdjord
Copy link

This is no longer blocking us, because we've found a workaround, but it's a bit of a hack:

  • When we build the installer, we search all our compose files, looking for images which specify a digest (e.g. foo@sha256:bar)
  • We pull the required image using the specified image
  • We generate a unique tag from the digest (e.g. sha256:bar -> digest_sha256_bar)
  • The digest-specified image is retagged with the new tag (e.g. docker tag 'foo@sha256:bar' 'foo:digest_sha256_bar')
  • The image in included in the images bundle using this new tag
  • The copy of the compose file in the installer is edited to use the tag instead of the digest

@seahurt
Copy link

seahurt commented Jun 29, 2023

I want to pull image:new, but it is too big. I only have an old image tar file, which is saved using docker save. When I loaded the old image, I thought it would be fast to pull image:new. But it is not.

When I run docker pull image:old, it is immediately finished. But the layer hash is still missing.

So if it is possible when I docker pull image:old, the layer hashes being filled, so I can pull newer image fast?

@timthelion
Copy link
Contributor

Since image ids are now stable would it be possible to allow us to do @ in docker compose so that we wouldn't need to pull the digests through?

@TryingToCodeSomething
Copy link

Just stumbled across the same problem. We have to use docker save/load to distribute Docker images as we don't have access to the internet on the destination hosts. We also have to make sure that the images on the destination hosts end up being exactly the same that were used while bundling the application. Content addressable images sounded perfect for this, but due to this issue this does not work as we hoped.

@codablock Were you able to solve this issue? I am also facing same problem. We need to deliver docker images in air-gapped environment and image digest is important for integrity check. Any updates from your side could be helpful for me. Thanks.

@thaJeztah thaJeztah added the containerd-integration Issues and PRs related to containerd integration label Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distribution containerd-integration Issues and PRs related to containerd integration version/1.10
Projects
None yet
Development

No branches or pull requests