Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New docker-container builders fail first bake using local cache #1325

Closed
rcwbr opened this issue Sep 20, 2022 · 16 comments
Closed

New docker-container builders fail first bake using local cache #1325

rcwbr opened this issue Sep 20, 2022 · 16 comments
Labels
area/buildkit kind/bug Something isn't working

Comments

@rcwbr
Copy link

rcwbr commented Sep 20, 2022

Behavior

Using a fresh buildx docker-container builder, a bake using a (populated) local cache and a build context (i.e. COPY, RUN --mount, etc.) will fail with one of ERROR: failed to solve: Canceled: grpc: the client connection is closing or ERROR: failed to solve: Unavailable: error reading from server: EOF

Desired behavior

The first build with a fresh builder must succeed against a local cache for practical use of the local cache in CI applications. With a builder that has already baked the images, this issue become intermittent. That case should also succeed consistently.

Environment

docker info:

Client:
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.9.1)
  compose: Docker Compose (Docker Inc., v2.10.2)
  extension: Manages Docker extensions (Docker Inc., v0.2.9)
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc., 0.6.0)
  scan: Docker Scan (Docker Inc., v0.19.0)

Server:
 Containers: 13
  Running: 4
  Paused: 0
  Stopped: 9
 Images: 59
 Server Version: 20.10.17
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
 runc version: v1.1.4-0-g5fd4c4d
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.10.124-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 5
 Total Memory: 7.667GiB
 Name: docker-desktop
 ID: P2BC:5HXV:5ELQ:YK6I:LRNJ:PVRL:FJ76:EZ7P:H2QB:QVXD:ON2C:AUVO
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5000
  127.0.0.0/8
 Live Restore Enabled: false

Steps to reproduce

Prepare the files (unzip this to skip):

$ mkdir base
$ mkdir layer
$ touch base/Dockerfile
$ touch base/file
$ touch layer/Dockerfile
$ touch images.json

base/Dockerfile:

FROM ubuntu as base

RUN sleep 2

COPY file file

layer/Dockerfile:

FROM base_target as layer

RUN sleep 5

images.json:

{
  "target": { 
    "common": {
      "platforms": [
        "linux/amd64"
      ]
    },
    "base": {
      "context": "base",
      "cache-from": [
        "type=local,src=../cache/base"
      ],
      "cache-to": [
        "type=local,mode=max,dest=../cache/base"
      ],
      "inherits": ["common"],
      "tags": [
        "base"
      ]
    },
    "layer": {
      "context": "layer",
      "cache-from": [
        "type=local,src=../cache/layer"
      ],
      "cache-to": [
        "type=local,mode=max,dest=../cache/layer"
      ],
      "contexts": {
        "base_target": "target:base"
      },
      "inherits": ["common"],
      "tags": [
        "layer"
      ]
    }
  }
}

Create the builder:

docker buildx create --name container_driver_builder --driver docker-container

Populate the cache:

docker buildx bake --builder container_driver_builder -f images.json layer

For each subsequent test, remove the builder, recreate it, and rebuild the bake targets:

docker buildx rm container_driver_builder \
  && docker buildx create --name container_driver_builder --driver docker-container \
  && docker buildx bake --builder container_driver_builder -f images.json layer

Each such test fails with ERROR: failed to solve: Canceled: grpc: the client connection is closing or ERROR: failed to solve: Unavailable: error reading from server: EOF.

image_defs.zip

@tonistiigi
Copy link
Member

I got a stacktrace to better understand this issue

2022/09/21 02:16:41 CalcSlowCache Canceled: grpc: the client connection is closing: unknown
1 v0.10.0-583-g3fab38923.m buildkitd --debug
github.com/moby/buildkit/session/content.(*callerContentStore).ReaderAt
	/src/session/content/caller.go:81
github.com/moby/buildkit/util/contentutil.(*MultiProvider).ReaderAt
	/src/util/contentutil/multiprovider.go:78
github.com/moby/buildkit/util/pull/pullprogress.(*ProviderWithProgress).ReaderAt
	/src/util/pull/pullprogress/progress.go:28
github.com/moby/buildkit/util/contentutil.(*localFetcher).Fetch
	/src/util/contentutil/copy.go:29
github.com/moby/buildkit/util/resolver/limited.(*fetcher).Fetch
	/src/util/resolver/limited/group.go:113
github.com/containerd/containerd/remotes.fetch
	/src/vendor/github.com/containerd/containerd/remotes/handlers.go:141
github.com/containerd/containerd/remotes.FetchHandler.func1
	/src/vendor/github.com/containerd/containerd/remotes/handlers.go:103
github.com/moby/buildkit/util/resolver/retryhandler.New.func1
	/src/util/resolver/retryhandler/retry.go:25
github.com/moby/buildkit/util/contentutil.Copy
	/src/util/contentutil/copy.go:18
github.com/moby/buildkit/cache.lazyRefProvider.Unlazy.func1
	/src/cache/remote.go:335
github.com/moby/buildkit/util/flightcontrol.(*call).run
	/src/util/flightcontrol/flightcontrol.go:121
sync.(*Once).doSlow
	/usr/local/go/src/sync/once.go:74
sync.(*Once).Do
	/usr/local/go/src/sync/once.go:65
runtime.goexit
	/usr/local/go/src/runtime/asm_arm64.s:1172

time="2022-09-21T02:16:41Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = failed to compute cache key: Canceled: grpc: the client connection is closing: unknown"
failed to compute cache key: Canceled: grpc: the client connection is closing: unknown
1 v0.10.0-583-g3fab38923.m buildkitd --debug
github.com/moby/buildkit/session/content.(*callerContentStore).ReaderAt
	/src/session/content/caller.go:81
github.com/moby/buildkit/util/contentutil.(*MultiProvider).ReaderAt
	/src/util/contentutil/multiprovider.go:78
github.com/moby/buildkit/util/pull/pullprogress.(*ProviderWithProgress).ReaderAt
	/src/util/pull/pullprogress/progress.go:28
github.com/moby/buildkit/util/contentutil.(*localFetcher).Fetch
	/src/util/contentutil/copy.go:29
github.com/moby/buildkit/util/resolver/limited.(*fetcher).Fetch
	/src/util/resolver/limited/group.go:113
github.com/containerd/containerd/remotes.fetch
	/src/vendor/github.com/containerd/containerd/remotes/handlers.go:141
github.com/containerd/containerd/remotes.FetchHandler.func1
	/src/vendor/github.com/containerd/containerd/remotes/handlers.go:103
github.com/moby/buildkit/util/resolver/retryhandler.New.func1
	/src/util/resolver/retryhandler/retry.go:25
github.com/moby/buildkit/util/contentutil.Copy
	/src/util/contentutil/copy.go:18
github.com/moby/buildkit/cache.lazyRefProvider.Unlazy.func1
	/src/cache/remote.go:335
github.com/moby/buildkit/util/flightcontrol.(*call).run
	/src/util/flightcontrol/flightcontrol.go:121
sync.(*Once).doSlow
	/usr/local/go/src/sync/once.go:74
sync.(*Once).Do
	/usr/local/go/src/sync/once.go:65
runtime.goexit
	/usr/local/go/src/runtime/asm_arm64.s:1172

1 v0.10.0-583-g3fab38923.m buildkitd --debug
github.com/moby/buildkit/solver.(*edge).createInputRequests.func1.1
	/src/solver/edge.go:842
github.com/moby/buildkit/solver/internal/pipe.NewWithFunction.func2
	/src/solver/internal/pipe/pipe.go:82
runtime.goexit
	/usr/local/go/src/runtime/asm_arm64.s:1172

1 v0.10.0-583-g3fab38923.m buildkitd --debug
main.unaryInterceptor.func1
	/src/cmd/buildkitd/main.go:572
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
	/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1
	/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
github.com/moby/buildkit/api/services/control._Control_Solve_Handler
	/src/api/services/control/control.pb.go:1718
google.golang.org/grpc.(*Server).processUnaryRPC
	/src/vendor/google.golang.org/grpc/server.go:1283
google.golang.org/grpc.(*Server).handleStream
	/src/vendor/google.golang.org/grpc/server.go:1620
google.golang.org/grpc.(*Server).serveStreams.func1.2
	/src/vendor/google.golang.org/grpc/server.go:922
runtime.goexit
	/usr/local/go/src/runtime/asm_arm64.s:1172

So the case is that first builds loads cache but it remains only a lazy ref https://github.com/moby/buildkit/blob/v0.10.4/cache/remote.go#L336 created with provider from session. Then a second build comes in when the first session is already dropped and matches against the previous lazy ref. Then unlazy gets called and fails because of the session is already gone.

@sipsma @ktock

I guess the simplest fix is to try to disable lazy behavior for local cache imports from session because it seems fragile.

More proper fixes would be to make sure lazy ref is not matched if it is a different session or add the current session to the group(not sure if this is quite safe actually).

On bake we might need a fix as well to keep the original session alive until all builds have completed. I'm thinking of the case where a "local source" would need to be pulled in by a subsequent build(not sure how practical). But I think this cache issue could appear by just doing two individual builds with same cache source from two different terminals.

@jaudiger
Copy link

jaudiger commented Feb 3, 2023

Not sure, if it's related to this issue. But since the upgrade to Docker version 23.0.0 which embeds buildx 0.10.2 as its default builder, some people are encountering issues during building a devcontainer (a feature from VSCode).

The error Fail to build a devcontainer: ERROR: failed to receive status: rpc error: code = Unavailable desc = error reading from server: EOF seems to relate to the same one as shown in the description. Could someone confirm that it's the same issue or a completely different one?

@crazy-max
Copy link
Member

@jaudiger I don't think this is related. Can you show the output of docker buildx ls? If you have a simple repro with the Dockerfile, the build command and logs that would be handy.

@jaudiger
Copy link

jaudiger commented Feb 3, 2023

@crazy-max While working on a small repro which can be done with this Dockerfile (Dockerfile) and this command:

docker buildx build --build-arg BUILDKIT_INLINE_CACHE=1 -f ./Dockerfile.txt -t test --target bar ./
[+] Building 1.4s (5/6)                                                                                                                                                                                                                                                     
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                                   0.0s
 => => transferring dockerfile: 158B                                                                                                                                                                                                                                   0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                                      0.0s
 => => transferring context: 2B                                                                                                                                                                                                                                        0.0s
 => [internal] load metadata for docker.io/library/fedora:37                                                                                                                                                                                                           1.3s
 => [foo 1/1] FROM docker.io/library/fedora:37@sha256:3487c98481d1bba7e769cf7bcecd6343c2d383fdd6bed34ec541b6b23ef07664                                                                                                                                                 0.0s
 => CACHED [bar 1/1] RUN echo "From inside"                                                                                                                                                                                                                            0.0s
 => preparing layers for inline cache                                                                                                                                                                                                                                  0.1s
ERROR: failed to receive status: rpc error: code = Unavailable desc = error reading from server: EOF

I found the culprit. If I remove the option: BUILDKIT_INLINE_CACHE=1, it builds the image, but without it, I got the error above.

@topletal
Copy link

topletal commented Feb 3, 2023

Not sure, if it's related to this issue. But since the upgrade to Docker version 23.0.0 which embeds buildx 0.10.2 as its default builder, some people are encountering issues during building a devcontainer (a feature from VSCode).

The error Fail to build a devcontainer: ERROR: failed to receive status: rpc error: code = Unavailable desc = error reading from server: EOF seems to relate to the same one as shown in the description. Could someone confirm that it's the same issue or a completely different one?

I can confirm this new error. Happens on gitlab CI/CD with docker:dind service when building docker container. Started happening today, with no changes to any docker/CI or related files.

@bwenzel2
Copy link

bwenzel2 commented Feb 3, 2023

We're also seeing the same error suddenly appear despite not changing anything CI/CD or Docker related on our end for months. GitLab CI/CD running dind with Docker 20.10.13 with BUILDKIT_INLINE_CACHE=1. Followed @jaudiger's suggestion above, and removing the BUILDKIT_INLINE_CACHE=1 build arg does seem to fix the issue, but I'm curious why this would suddenly break without us having upgraded or changed anything on our end. Wondering if something changed on the DockerHub side, since that's where all our repos are.

@m-melis
Copy link

m-melis commented Feb 3, 2023

Confirmed same error here.

@antonioconselheiro
Copy link

antonioconselheiro commented Feb 3, 2023

Same error here, works nice yesterday and today launching devcontainer I got the error: "ERROR: failed to solve: Unavailable: error reading from server: EOF" and "ERROR: failed to receive status: rpc error: code = Unavailable desc = error reading from server: EOF"... I scoured the internet looking for a solution, but so far I haven't found anything that would help me, I even formatted my ubuntu but the error keep

error logs:
image

on this repo:
https://github.com/antonioconselheiro/bater-ponto

launch just when trying to launch devcontainer, the microsoft extension for vscode (using devcontainer open .)

@crazy-max
Copy link
Member

@jaudiger @bwenzel2 @m-melis @antonioconselheiro Same as moby/buildkit#3576, will be fixed with moby/moby#44920.

Closing this issue since it has been fixed in BuildKit 0.11.2 (moby/buildkit#3493)

@denibertovic
Copy link

@crazy-max has there been a new docker image release with the fix?

@crazy-max
Copy link
Member

crazy-max commented Feb 8, 2023

@denibertovic For this issue, it's already fixed with BuildKit 0.11.2 which is already released. For moby/moby#44920 it will be on next Moby patch release (23.0.1).

@denibertovic
Copy link

@crazy-max I understand. I'm using the official docker images from docker hub in my CI/CD pipeline. And from what I can tell 23.0.1 was not released yet. The latest one that's pushed seems to be 23.0.0.

@mathemaphysics
Copy link

mathemaphysics commented Feb 8, 2023

This issue completely breaks my ability to build and run any docker devcontainer in VS Code on Ubuntu 22.04. I've been able to block it in the devcontainer.json using the args: {} section. It seems to listen to me when I add BUILDKIT_INLINE_CACHE=0.

This was a new install. Firs time in a while. This will probably seriously disrupt a lot of people who use devcontainers in Docker.

@razilevin
Copy link

This issue completely breaks my ability to build and run any docker devcontainer in VS Code on Ubuntu 22.04. I've been able to block it in the devcontainer.json using the args: {} section. It seems to listen to me when I add BUILDKIT_INLINE_CACHE=0.

This was a new install. Firs time in a while. This will probably seriously disrupt a lot of people who use devcontainers in Docker.

Damn where were you when this issue first started and I was editing VSC code LOL

@anthonyalayo
Copy link

anthonyalayo commented Jul 10, 2023

Is this still expected to be a problem? I'm encountering this using:

  • Docker for Mac
  • Buildkit
  • Cache Mounts
  • A C++ build using an Ubuntu 22 base

When I do the exact same build on a Linux VM, I don't hit this issue.

@sanfx
Copy link

sanfx commented Jun 23, 2024

@denibertovic For this issue, it's already fixed with BuildKit 0.11.2 which is already released. For moby/moby#44920 it will be on next Moby patch release (23.0.1).

I am still getting the error when the Buildkit in my case is v0.13.2

sudo docker buildx ls 
NAME/NODE     DRIVER/ENDPOINT   STATUS    BUILDKIT   PLATFORMS
default*      docker                                 
 \_ default    \_ default       running   v0.13.2    linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/amd64/v4, linux/386

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/buildkit kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.