Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use kaniko with alpine v3.12+ due to /var/run behavior #1297

Open
bobcatfish opened this issue Jun 4, 2020 · 42 comments
Open

Can't use kaniko with alpine v3.12+ due to /var/run behavior #1297

bobcatfish opened this issue Jun 4, 2020 · 42 comments
Labels
area/behavior all bugs related to kaniko behavior like running in as root area/symlinks categorized differs-from-docker image/alpine issue/failed-to-rename issue/mount kind/bug Something isn't working priority/p1 Basic need feature compatibility with docker build. we should be working on this next. priority/p2 High impact feature/bug. Will get a lot of users happy work-around-available works-with-docker

Comments

@bobcatfish
Copy link
Contributor

Actual behavior

Tekton is using Kaniko to build a Docker image from alpine and recently the builds started failing.

TL;DR

The alpine:3.12 image has /var/run aliased to /run. When running kaniko in a kubernetes pod with service accounts, the serviceaccounts often seem to end up mounted to /var/run.

Kaniko is ignoring the contents and state of /var/run in the base image (alpine:3.12) but unfortunately some details of alpine seem to depend on /var/run being a symlink to /run, and so not preserving that is causing upgrading alpine packages to fail.

Details

We discovered this in tektoncd/pipeline#2738.

It seems the problem is caused by recent versions of alpine-baselayout in alpine3.12. When we build from alpine 3.12 and upgrade all alpine packages, the alpine-baselayout upgrade fails:

(1/1) Upgrading alpine-baselayout (3.2.0-r6 -> 3.2.0-r7)
Executing alpine-baselayout-3.2.0-r7.pre-upgrade
rm: can't remove '/var/run/secrets/kubernetes.io/serviceaccount/..data': Read-only file system

Expected behavior

Kaniko should detect that /var/run is a symlink in the base image and preserve that. (I think! I'm not sure if it's that simple.)

To Reproduce

Using this dockerfile and mounting a file into /var/run, I can build with docker but not with Kaniko.

Trying to build with kaniko:

docker run -v `pwd`:/workspace/go/src/github.com/tektoncd/pipeline -v `pwd`/SECRET.json:/var/run/secrets/SECRET.json:ro -e GOOGLE_APPLICATION_CREDENTIALS=/workspace/go/src/github.com/tektoncd/pipeline/SECRET.json gcr.io/kaniko-project/executor:v0.17.1 --dockerfile=/workspace/go/src/github.com/tektoncd/pipeline/images/Dockerfile --destination=gcr.io/christiewilson-catfactory/pipeline-release-test --context=/workspace/go/src/github.com/tektoncd/pipeline -v debug
(1/2) Upgrading alpine-baselayout (3.2.0-r6 -> 3.2.0-r7)
Executing alpine-baselayout-3.2.0-r7.pre-upgrade
rm: can't remove '/var/run/secrets/SECRET.json': Resource busy
ERROR: alpine-baselayout-3.2.0-r7: failed to rename var/.apk.f752bb51c942c7b3b4e0cf24875e21be9cdcd4595d8db384 to var/run.

The error above about not being able to remove the file seems to come from https://git.alpinelinux.org/aports/tree/main/alpine-baselayout/alpine-baselayout.pre-upgrade which works just fine if /var/run is a symlink to /run, which I discovered by trying to do the same thing by using the alpine image directly without kaniko:

docker run --entrypoint /bin/ash -v `pwd`/SECRET.json:/var/run/secrets/SECRET.json:ro alpine:3.12 -c "apk update && apk upgrade alpine-baselayout"

That works just fine!

I tried not whitelisting /var/run and that didn't work either:

docker run -v `pwd`:/workspace/go/src/github.com/tektoncd/pipeline -v `pwd`/SECRET.json:/var/run/secrets/SECRET.json:ro -e GOOGLE_APPLICATION_CREDENTIALS=/workspace/go/src/github.com/tektoncd/pipeline/SECRET.json gcr.io/kaniko-project/executor:v0.17.1 --dockerfile=/workspace/go/src/github.com/tektoncd/pipeline/images/Dockerfile --destination=gcr.io/christiewilson-catfactory/pipeline-release-test --context=/workspace/go/src/github.com/tektoncd/pipeline --whitelist-var-run=false -v debug
error building image: error building stage: failed to get filesystem from image: error removing var/run to make way for new symlink: unlinkat /var/run/secrets/SECRET.json: device or resource busy

Finally, using docker to build the image (from the pipelines repo checkout) worked just fine:

pipeline git:(pin_to_stable_alpine) ✗ pwd
/Users/christiewilson/Code/go/src/github.com/tektoncd/pipeline
pipeline git:(pin_to_stable_alpine) ✗ docker build -t poop -f ./images/Dockerfile  .
Sending build context to Docker daemon  150.9MB
Step 1/2 : FROM alpine:3.12
 ---> a24bb4013296
Step 2/2 : RUN apk add --update git openssh-client     && apk update     && apk upgrade alpine-baselayout
 ---> Using cache
 ---> ff08e33b783d
Successfully built ff08e33b783d
Successfully tagged poop:latest

Additional Information

Triage Notes for the Maintainers

Description Yes/No
Please check if this a new feature you are proposing
Please check if the build works in docker but not in kaniko
Please check if this error is seen when you use --cache flag
Please check if your dockerfile is a multistage dockerfile
@tejal29
Copy link
Contributor

tejal29 commented Jun 5, 2020

@bobcatfish I don't think using --whitelist=/var/run is the right approach here.
You would not want your secrets end up in the image.

I tried building your image like this to inspect what the FS looks like.

docker run -it --entrypoint /busybox/sh -v /Users/tejaldesai/workspace/recreate:/workspace -v /Users/tejaldesai/workspace/keys/tejal-test.json:/var/run/secrets/SECRET.json:ro gcr.io/kaniko-project/executor:debug-v0.23.0
/ # /kaniko/executor --context=dir://workspace --no-push

...
OK: 12729 distinct packages available
(1/2) Upgrading alpine-baselayout (3.2.0-r6 -> 3.2.0-r7)
Executing alpine-baselayout-3.2.0-r7.pre-upgrade
rm: can't remove '/var/run/secrets/SECRET.json': Resource busy
ERROR: alpine-baselayout-3.2.0-r7: failed to rename var/.apk.f752bb51c942c7b3b4e0cf24875e21be9cdcd4595d8db384 to var/run.
Executing alpine-baselayout-3.2.0-r7.post-upgrade
(2/2) Upgrading ca-certificates-bundle (20191127-r2 -> 20191127-r3)
Executing busybox-1.31.1-r16.trigger
Executing ca-certificates-20191127-r3.trigger
1 error; 27 MiB in 25 packages
error building image: error building stage: failed to execute command: waiting for process to exit: exit status 1

it failed as expected.

The files in the /var/.apk dir is the secret file.

/ # ls -al /var/.apk.f752bb51c942c7b3b4e0cf24875e21be9cdcd4595d8db384/
total 12
drwxr-xr-x    3 root     root          4096 Jun  5 02:18 .
drwxr-xr-x    1 root     root          4096 Jun  5 02:08 ..
drwxr-xr-x    2 root     root          4096 Jun  5 02:07 secrets
/ # 

Another question I had is, from the pre_upgrade script, it is not clear where rename happens.

I removed the read-only secret mounted to /var/run and the build works fine.

docker run -it --entrypoint /busybox/sh -v /Users/tejaldesai/workspace/recreate:/workspace  gcr.io/kaniko-project/executor:debug-v0.23.0

/ # /kaniko/executor --context dir://workspace --no-push
INFO[0000] Retrieving image manifest alpine:3.12        
INFO[0001] Retrieving image manifest alpine:3.12        
INFO[0002] Built cross stage deps: map[]                
INFO[0002] Retrieving image manifest alpine:3.12        
INFO[0004] Retrieving image manifest alpine:3.12        
INFO[0005] Executing 0 build triggers                   
INFO[0005] Unpacking rootfs as cmd RUN apk add --update git openssh-client     && apk update     && apk upgrade requires it. 
INFO[0005] RUN apk add --update git openssh-client     && apk update     && apk upgrade 
INFO[0005] Taking snapshot of full filesystem...        
INFO[0005] Resolving 491 paths                          
INFO[0005] cmd: /bin/sh                                 
INFO[0005] args: [-c apk add --update git openssh-client     && apk update     && apk upgrade] 
INFO[0005] Running: [/bin/sh -c apk add --update git openssh-client     && apk update     && apk upgrade] 
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/community/x86_64/APKINDEX.tar.gz
(1/11) Installing ca-certificates (20191127-r3)
(2/11) Installing nghttp2-libs (1.41.0-r0)
(3/11) Installing libcurl (7.69.1-r0)
(4/11) Installing expat (2.2.9-r1)
(5/11) Installing pcre2 (10.35-r0)
(6/11) Installing git (2.26.2-r0)
(7/11) Installing openssh-keygen (8.3_p1-r0)
(8/11) Installing ncurses-terminfo-base (6.2_p20200523-r0)
(9/11) Installing ncurses-libs (6.2_p20200523-r0)
(10/11) Installing libedit (20191231.3.1-r0)
(11/11) Installing openssh-client (8.3_p1-r0)
Executing busybox-1.31.1-r16.trigger
Executing ca-certificates-20191127-r3.trigger
OK: 27 MiB in 25 packages
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/community/x86_64/APKINDEX.tar.gz
v3.12.0-45-g0e4d4e3558 [http://dl-cdn.alpinelinux.org/alpine/v3.12/main]
v3.12.0-46-g02e8db0c3e [http://dl-cdn.alpinelinux.org/alpine/v3.12/community]
OK: 12729 distinct packages available
(1/2) Upgrading alpine-baselayout (3.2.0-r6 -> 3.2.0-r7)
Executing alpine-baselayout-3.2.0-r7.pre-upgrade
Executing alpine-baselayout-3.2.0-r7.post-upgrade
(2/2) Upgrading ca-certificates-bundle (20191127-r2 -> 20191127-r3)
Executing busybox-1.31.1-r16.trigger
Executing ca-certificates-20191127-r3.trigger
OK: 27 MiB in 25 packages
INFO[0008] Taking snapshot of full filesystem...        
INFO[0008] Resolving 1211 paths                         
INFO[0009] RUN ls -al /var/run                          
INFO[0009] cmd: /bin/sh                                 
INFO[0009] args: [-c ls -al /var/run]                   
INFO[0009] Running: [/bin/sh -c ls -al /var/run]        
lrwxrwxrwx    1 root     root             4 Jun  5 02:23 /var/run -> /run
INFO[0009] Taking snapshot of full filesystem...        
INFO[0009] Resolving 1211 paths                         
INFO[0009] No files were changed, appending empty layer to config. No layer added to image. 
INFO[0009] Skipping push to container registry due to --no-push flag 
/ # 

Can the read only secret be mounted in another dir?

Another option is to look into apk upgrade --no-commit-hooks flag. However, not sure if that would have any side-effects.
I will keep looking for something better.

@olivier-mauras
Copy link

olivier-mauras commented Jun 5, 2020

Can the read only secret be mounted in another dir?

This wouldn't work on Kubernetes that mounts the service account secret automatically under /var/run right?
I tried with --no-scripts or --no-commit-hooks but it doesn't help either.

@olivier-mauras
Copy link

olivier-mauras commented Jun 5, 2020

So one ugly hack would be to install your package in a different root and then copy it over / in a final scratch image.

I made it work from this minimal Dockerfile directly inside a Kubernetes container:

FROM alpine:3.12 AS SRC                                                                         

RUN set -x; \
    # Actually make the installation in a different root dir
    mkdir -p /proot/etc; \
    \
    apk -p /proot add --initdb && \
    \
    cp -r /etc/apk /proot/etc; \
    \
    apk -p /proot update && \
    apk -p /proot fix && \
    apk -p /proot add curl ca-certificates tzdata zip unzip openssl && \
    \
    <whatever needs to be done> \
    \
    # Clean up
    rm -rf /proot/dev; \
    rm -rf /proot/sys; \
    rm -rf /proot/proc; \
    unlink /proot/var/run; \
    rm -rf /proot/var/cache/apk/*

FROM scratch
COPY --from=SRC /proot/ /
RUN <all the commands you'd have run after your pkg install>

Indeed this is only a basic workaround as this will come to bite you back anytime you need to install more packages in an image dependant of this one.... If you don't have much specific needs at least it builds an alpine:3.12 :D

@tejal29
Copy link
Contributor

tejal29 commented Jun 5, 2020

Another hack would be to not fail if apk upgrade fails due to error " ERROR: alpine-baselayout-3.2.0-r7: failed to rename var/.apk...."

Say, you create an upgrade script apk-upgrade.sh

#!/bin/bash

ERR='apk add --update git openssh-client     && apk update     && apk upgrade alpine-baselayout'
EXIT_CODE=$?
// if exit code is 0 then return exit code

PERMISSIBLE_ERR="ERROR: alpine-baselayout-3.2.0-r7: failed to rename"
if [[ "$ERR" == *"$PERMISSIBLE_ERR"* ]]; then
 // Swallow error
  exit 0
fi
// probably some other error
exit 1

@olivier-mauras
Copy link

Wouldn't it be easier for kaniko to extract images and run commands in a separate root context ? How difficult would it be to implement?

@tejal29
Copy link
Contributor

tejal29 commented Jun 5, 2020

After looking at the output of --no-scripts looks like the error is actually happening when upgrading alpine-baselayout-3.2.0-r7

(1/2) Upgrading alpine-baselayout (3.2.0-r6 -> 3.2.0-r7)
ERROR: alpine-baselayout-3.2.0-r7: failed to rename var/.apk.f752bb51c942c7b3b4e0cf24875e21be9cdcd4595d8db384 to var/run.
(2/2) Upgrading ca-certificates-bundle (20191127-r2 -> 20191127-r3)

However it is still installed.

/ # /sbin/apk list | grep alpine-baselayout
alpine-baselayout-3.2.0-r7 x86_64 {alpine-baselayout} (GPL-2.0-only) [installed]

@tejal29
Copy link
Contributor

tejal29 commented Jun 5, 2020

Wouldn't it be easier for kaniko to extract images and run commands in a separate root context ? How difficult would it be to implement?

@olivier-mauras That would involve some major design changes. The way kaniko executed run command is, it actually calls a exec.Commad.Start
I am not sure how to run the command in a separate root context.
Do you mean we map "/" to "/tmp_run_XXX" ?

@olivier-mauras
Copy link

olivier-mauras commented Jun 5, 2020

Do you mean we map "/" to "/tmp_run_XXX" ?

Yeah like using chroot so that you don't have any mixups...
There's problems doing simple things like COPY --from=another / / because kaniko works on its own root and then tries to copy /dev,/sys,/proc and the likes.

Would that work? https://golang.org/pkg/syscall/#Chroot

EDIT: https://github.com/GoogleContainerTools/kaniko/blob/master/pkg/commands/run.go#L210 Am I understanding correctly that RootDir could be changed but there's just no option to do so?

@tejal29
Copy link
Contributor

tejal29 commented Jun 5, 2020

If we choose this approach for every command, i,e map "/" to another directory in "tmp", then i see 2 issues.

  1. When executing subsequent run commands, we need to find all file changes/modified or deleted.
    if we plan to use this approach, it would mean subsequent run command will be independent of each other which is not case.
    To ensure, that, we will have to copy over all changes back to "/" or the next command's chroot.
    This could introduce delays.
  2. If the Run command uses commands installed in paths relative to "/" how would that work.

Another approach would be to map "/" to "/tmp/kanikoRootXXX" at the beginning of the build. (which is probably what you are suggesting in the edit)
I think that could work but we need to do something like this for all the Metadata commands like "ENV", "WORKDIR". Also for all the base images, we need to map their ImageConfig.Env paths to be relative to this new chroot.

I don't think its not feasible or ugly. It could be a little hard to wire up. I would not mind pursuing this direction.

@olivier-mauras
Copy link

Another approach would be to map "/" to "/tmp/kanikoRootXXX" at the beginning of the build. (which is probably what you are suggesting in the edit)

Exactly

I don't think its not feasible or ugly. It could be a little hard to wire up. I would not mind pursuing this direction.

This would probably solve quite a bunch of COPY issue at once

@tejal29
Copy link
Contributor

tejal29 commented Jun 5, 2020

The only caveat is, currently i am the only one working actively on this project in 20% capacity.
I wont be able to get this in soon. I can definitely help design/review this.

@tejal29 tejal29 added area/behavior all bugs related to kaniko behavior like running in as root kind/bug Something isn't working work-around-available labels Jun 5, 2020
@tibbon
Copy link

tibbon commented Jun 25, 2020

I'm running into this too. Upgrading from a Ruby image that uses 3.10 to 3.12 and I'm hitting this in my Gitlab CI. Unsure what the best path forward is there.

@hughobrien
Copy link

One particularly quick fix is : apk upgrade --no-cache --ignore alpine-baselayout. Though be warned, apk explicitly says that partial upgrades aren't supported (but at least you can test).

@jpower432
Copy link

@bobcatfish can you share the workaround that is working for you? We are running into the same issues using Kaniko in our Tekton pipelines.

@bobcatfish
Copy link
Contributor Author

Hey @jpower432 - our workaround is just to pin to alpine 3.11 tektoncd/pipeline#2757 This works for us because we didn't have any particular need to use 3.12 but won't work for you if you actually need 3.12 :O

@kamikaze
Copy link

same here

@bitsofinfo
Copy link

#1127

@bitsofinfo
Copy link

bitsofinfo commented Jul 16, 2020

i have other issues continually w/ kaniko when any operations (like COPY) go against prior targets that are symlinked. The original permissions always get removed, this does not happen w/ docker build. (v0.24)

@mattsurge
Copy link

Seeing this issue as well, using kaniko in a gitlab runner. I suppose the solution is to pin all alpine builds we have at 3.11?

@tejal29 tejal29 added the priority/p3 agreed that this would be good to have, but no one is available at the moment. label Aug 12, 2020
@tejal29
Copy link
Contributor

tejal29 commented Aug 12, 2020

@mattsurge yes.

@infa-sgorrela
Copy link

@tejal29 ,

Do we have any timeline to fix this issue for the latest alpine builds, we are kinda blocked to use kaniko to build alpine images.

we can't ignore alpine-baselayout since it has core package updates.
apk upgrade --no-cache --ignore alpine-baselayout

@liuzhe7
Copy link

liuzhe7 commented Oct 19, 2020

Is this problem solved now?

@czunker
Copy link

czunker commented Oct 21, 2020

Another solution is to not mount the service account token automatically:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#use-the-default-service-account-to-access-the-api-server
Probably you don't need the token.

GitLab has a feature request to add this as an option: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/4786
And without the mounted token there is no /var/run/secrets/kubernetes.io/serviceaccount directory and therefore no problem.

alt4 added a commit to alt4/docker-blrevive that referenced this issue Sep 16, 2022
@tobiasmcnulty
Copy link

It seems like there is a setting for this? https://github.com/GoogleContainerTools/kaniko#flag---ignore-var-run

@acohenOT
Copy link

@tobiasmcnulty --ignore-var-run=false doesn't appear to work either

/kaniko/executor --ignore-var-run=false --snapshotMode=redo --single-snapshot

error building image: error building stage: failed to get filesystem from image: error removing var/run to make way for new symlink: unlinkat /var/run/secrets/kubernetes.io/serviceaccount/..data: read-only file system

@tobiasmcnulty
Copy link

@tobiasmcnulty --ignore-var-run=false doesn't appear to work either

/kaniko/executor --ignore-var-run=false --snapshotMode=redo --single-snapshot

error building image: error building stage: failed to get filesystem from image: error removing var/run to make way for new symlink: unlinkat /var/run/secrets/kubernetes.io/serviceaccount/..data: read-only file system

Boo. Okay. Thanks for testing it. Just saw this in the docs looking for something else and didn't see it mentioned elsewhere on this issue.

@jbg
Copy link

jbg commented Sep 23, 2022

Note that if it did work, and you are building on K8s, you'd probably end up with secrets in your image.

ChandonPierre added a commit to coreweave/samba that referenced this issue Oct 10, 2022
ChandonPierre added a commit to coreweave/samba that referenced this issue Oct 10, 2022
ChandonPierre added a commit to coreweave/samba that referenced this issue Oct 11, 2022
* refactor(ci): Use Todie spec

* feat: bump to 4.16.4

* fix: disable alpine package upgrade

Due to this bug GoogleContainerTools/kaniko#1297
@VitorNilson
Copy link

Hey folks, any news about this error? I'm getting this when I try to run apk update:

Log:

-------........-------
INFO[0058] Args: [-c apk update]                        
INFO[0058] Running: [/bin/sh -c apk update]             
fetch https://dl-cdn.alpinelinux.org/alpine/v3.16/main/aarch64/APKINDEX.tar.gz
ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.16/main: temporary error (try again later)
WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.16/main: No such file or directory
fetch https://dl-cdn.alpinelinux.org/alpine/v3.16/community/aarch64/APKINDEX.tar.gz
ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.16/community: temporary error (try again later)
WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.16/community: No such file or directory
2 errors; 14 distinct packages available
error building image: error building stage: failed to execute command: waiting for process to exit: exit status 2

@thetredev
Copy link

Bump :)

@itz-Jana
Copy link

Bump. I'm trying something else, where I need to delete the content of /var/* and it's failing due to this.

@aaron-prindle aaron-prindle changed the title Can't use kaniko with alpine due to /var/run behavior Can't use kaniko with alpine 3.12+ due to /var/run behavior Jul 5, 2023
@aaron-prindle aaron-prindle changed the title Can't use kaniko with alpine 3.12+ due to /var/run behavior Can't use kaniko with alpine v3.12+ due to /var/run behavior Jul 5, 2023
@aaron-prindle aaron-prindle added issue/failed-to-rename priority/p1 Basic need feature compatibility with docker build. we should be working on this next. area/behavior all bugs related to kaniko behavior like running in as root priority/p2 High impact feature/bug. Will get a lot of users happy and removed area/behavior all bugs related to kaniko behavior like running in as root priority/p3 agreed that this would be good to have, but no one is available at the moment. labels Jul 5, 2023
@albertodiazdorado
Copy link

Bump

@JacksonChen63
Copy link

It seems like there is a setting for this? https://github.com/GoogleContainerTools/kaniko#flag---ignore-var-run

Thanks bro, it's worked for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/behavior all bugs related to kaniko behavior like running in as root area/symlinks categorized differs-from-docker image/alpine issue/failed-to-rename issue/mount kind/bug Something isn't working priority/p1 Basic need feature compatibility with docker build. we should be working on this next. priority/p2 High impact feature/bug. Will get a lot of users happy work-around-available works-with-docker
Projects
None yet
Development

No branches or pull requests