Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker-ce-18.09.2 and/or containerd.io-1.2.2 prevent containers from running #595

Closed
2 of 3 tasks
pmoris opened this issue Feb 16, 2019 · 13 comments
Closed
2 of 3 tasks

Comments

@pmoris
Copy link

pmoris commented Feb 16, 2019

  • This is a bug report
  • This is a feature request
  • I searched existing issues before opening this one

Expected behavior

Containers should start and remain running.

Actual behavior

I suspect that the update to docker-ce-18.09.2 and/or containerd.io-1.2.2 crashed my running containers and prevents the creation of new ones. Both actions lead to following error:

Cannot start service redis: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:293: copying bootstrap data to pipe caused \"write init-p: broken pipe\"": unknown


All running containers (which were managed by docker-compose) show the exit status Exited (128) (postgres, redis, nginx and django) or Exited (137) (celery-worker and celery-beat), which from what I've gathered points to an OOM error? However, the container logs show that my processes received shutdown requests (SIGTERM) and don't mention memory issues.

postgres_container | 2019-02-12T01:29:30.089053475Z 2019-02-12 01:29:30.088 UTC [1] LOG:  received smart shutdown request
postgres_container | 2019-02-12T01:29:30.111902444Z 2019-02-12 01:29:30.111 UTC [1] LOG:  worker process: logical replication launcher (PID 27) exited with exit code 1
postgres_container | 2019-02-12T01:29:30.111937077Z 2019-02-12 01:29:30.111 UTC [22] LOG:  shutting down
redis_container  | 2019-02-12T01:29:30.123951274Z 1:signal-handler (1549934970) Received SIGTERM scheduling shutdown...
nginx_container  | 2019-02-12T01:29:30.169354823Z 2019/02/12 01:29:30 [alert] 1#1: unlink() "/var/run/nginx.pid" failed (13: Permission denied)
postgres_container | 2019-02-12T01:29:30.173129602Z 2019-02-12 01:29:30.171 UTC [1] LOG:  database system is shut down
redis_container  | 2019-02-12T01:29:30.181581336Z 1:M 12 Feb 01:29:30.177 # User requested shutdown...
\\\ # omitted full redis shutdown logs
celery_worker_container | 2019-02-12T01:29:30.257633495Z [2019-02-12 01:29:30,224: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
celery_worker_container | 2019-02-12T01:29:30.257691039Z Traceback (most recent call last):
celery_worker_container | 2019-02-12T01:29:30.257697634Z   File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 177, in _read_from_socket
celery_worker_container | 2019-02-12T01:29:30.257701676Z     raise socket.error(SERVER_CLOSED_CONNECTION_ERROR)
celery_worker_container | 2019-02-12T01:29:30.257705303Z OSError: Connection closed by server.
\\\ # omitted python traceback
celery_worker_container | 2019-02-12T01:29:32.415761098Z [2019-02-12 01:29:32,415: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Error -2 connecting to redis:6379. Name or service not known..
celery_worker_container | 2019-02-12T01:29:32.415812976Z Trying again in 4.00 seconds...
celery_worker_container | 2019-02-12T01:29:32.415819541Z 
celery_worker_container | 2019-02-12T01:29:36.478919701Z [2019-02-12 01:29:36,478: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Error -2 connecting to redis:6379. Name or service not known..
celery_worker_container | 2019-02-12T01:29:36.478969366Z Trying again in 6.00 seconds...
celery_worker_container | 2019-02-12T01:29:36.478974635Z
# final message

docker inspect container

"State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 128,
            "Error": "OCI runtime create failed: container_linux.go:344: starting container process caused \"process_linux.go:293: copying bootstrap data to pipe caused \\\"write init-p: broken pipe\\\"\": unknown",
            "StartedAt": "2019-02-05T02:01:26.287177994Z",
            "FinishedAt": "2019-02-12T01:29:31.16757356Z"
        },

The same error appears when I try to spin-up a new container.

Update log

/var/cpanel/updatelogs/update.1549934941.log

[2019-02-12 02:29:47 +0100]      [/usr/local/cpanel/scripts/rpmup]   Updating   : 1:docker-ce-cli-18.09.2-3.el7.x86_64                         1/6
[2019-02-12 02:29:47 +0100]      [/usr/local/cpanel/scripts/rpmup]    Updating   : containerd.io-1.2.2-3.3.el7.x86_64                           2/6
[2019-02-12 02:29:47 +0100]      [/usr/local/cpanel/scripts/rpmup]    Updating   : 3:docker-ce-18.09.2-3.el7.x86_64                             3/6

Note that there's a 1 hour difference due to the configuration of the timezone on my OS compared to the containers. It also reports the same error as the inspect command showed:

var/log/messages

I added this as an attachment (var-log-messages.txt). It shows that an update was initiated right before the containers crashed.

Feb 12 02:29:45 server1 dockerd: time="2019-02-12T02:29:45.867408717+01:00" level=error msg="Failed to start containerr da183f015a4163ac9826971ade22b2ecc27a8cf661f4982c7a130d3cc5c3d268: OCI runtime create failed: container_linux.go:344: starting container process caused \"process_linux.go:293: copying bootstrap data to pipe caused \\\"write init-p: broken pipe\\\"\": unknown"

OS and kernel

$ cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)
$ uname -s -r
Linux 3.10.0-229.20.1.el7.centos.plus.x86_6
$ cat /proc/version
Linux version 3.10.0-229.20.1.el7.centos.plus.x86_64 ([email protected]) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #1 SMP Wed Nov 4 01:06:14 UTC 2015

Downgrade attempt

yum downgrade docker-ce (3:18.09.1-3.el7) still results in the same error message when I try to recreate my containers.

$ docker-compose up --force-recreate -d
Removing postgres_container
Removing redis_container
Recreating 5a76e1750c7b_redis_container ... 
Recreating bdb2d0d65cd8_postgres_container ... error

Recreating 5a76e1750c7b_redis_container    ... error
bootstrap data to pipe caused \"write init-p: broken pipe\"": unknown

ERROR: for 5a76e1750c7b_redis_container  Cannot start service redis: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:293: copying bootstrap data to pipe caused \"write init-p: broken pipe\"": unknown

ERROR: for postgres  Cannot start service postgres: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:293: copying bootstrap data to pipe caused \"write init-p: broken pipe\"": unknown

ERROR: for redis  Cannot start service redis: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:293: copying bootstrap data to pipe caused \"write init-p: broken pipe\"": unknown
ERROR: Encountered errors while bringing up the project.
make: *** [run] Error 1

Downgrading containerd.io (containerd.io.x86_64 0:1.2.2-3.el7) in addition to docker-ce does allow me to recreate the containers.

EDIT: upgrading the kernel to 3.10.0-957.5.1.el7.centos.plus.x86_64 also fixes the issue.

@thaJeztah
Copy link
Member

Note that CentOS uses a rolling release model, which means that older versions (including their kernels) reach EOL if a newer version is released.

Kernel 3.10.0-229 is a really old version of the CentOS kernel, so definitely not recommended to be running.

Also make sure you don't have a custom MountFlags option set in your systemd unit file if you're running docker 18.09 or up (see #485 (comment))

@pmoris
Copy link
Author

pmoris commented Feb 19, 2019

There's no MountFlags option specified in /lib/systemd/system/docker.service. Is that the correct file or should I be looking somewhere else?

@thaJeztah
Copy link
Member

Is that the correct file or should I be looking somewhere else?

Easiest way to find is to use systemctl cat docker.service - that will show the contents of all unit-files and any possible override/drop-in file that is loaded for the service.

@pmoris
Copy link
Author

pmoris commented Feb 19, 2019

Thanks for the swift reply! I see no mention of MountFlags in there. Here's the full output if that helps.

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service
Wants=network-online.target
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd://
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

@thaJeztah
Copy link
Member

ok, thanks! looks like there's indeed no MountFlags set for the service, so that's not the problem.

Looking at the error again (write init-p: broken pipe "": unknown), this looks pretty similar to #597

@trapier
Copy link

trapier commented Feb 20, 2019

There is no new information in this comment. Only condensed recreate steps and confirmation of previous observations that the issue goes away upon containerd downgrade or kernel upgrade.

Minimal Recreate

mkdir centos-1509
cd centos-1509
vagrant init centos/7 --box-version 1509.01
vagrant up
vagrant ssh
# install latest CE package versions
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo  https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install -y docker-ce-18.09.2-3.el7  docker-ce-cli-18.09.2-3.el7 containerd.io-1.2.2-3.3.el7
# enable docker service
sudo systemctl enable --now docker
# trigger issue
sudo docker run --rm alpine

Result:

$ sudo docker run --rm alpine
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
6c40cc604d8e: Pull complete 
Digest: sha256:b3dbf31b77fd99d9c08f780ce6f5282aba076d70a513a8be859d8d3a4d0c92b8
Status: Downloaded newer image for alpine:latest
docker: Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:293: copying bootstrap data to pipe caused \"write init-p: broken pipe\"": unknown.

Confirmed relief via containerd downgrade or kernel upgrade

As observed by @pmoris (thanks!), issue goes away when downgrading containerd:

sudo yum downgrade -y containerd.io-1.2.2-3.el7

... or when upgrading to latest kernel (3.10.0-957.5.1.el7).

@trapier
Copy link

trapier commented Feb 20, 2019

Pasted the install steps into a vagrant shell provisioner and bisected by vagrant box version:

test: docker run --rm alpine

pass/fail box kernel
fail 1601.01 3.10.0-327.4.5.el7.x86_64
pass 1611.01 3.10.0-514.2.2.el7.x86_64
pass 1705.02 3.10.0-514.21.2.el7.x86_64
pass 1802.01 3.10.0-693.17.1.el7.x86_64

-327 corresponds to a RHEL 7.2 kernel.

@thaJeztah
Copy link
Member

Thanks @trapier - so, the runc fix requires a kernel feature that was added in kernel 3.17, but was backported in RHEL kernels.

I wonder if kernel -514 was the first kernel they backported it to.

For Docker Engine Community, this is not an issue (as it is not supported on RHEL, only on CentOS, so only the latest kernel version is supported), but for Docker Engine Enterprise, we need to check if there's still versions of docker that are supported on RHEL 7.2 (if so, an alternative fix is needed)

@andrewhsu
Copy link
Contributor

Docker EE does not have anymore versions supported on RHEL 7.2 https://success.docker.com/article/compatibility-matrix

@leeningli
Copy link

leeningli commented Feb 22, 2019

Maybe my test is helpful:
My Test is as follow:(docker 18.09.2,centos7.6)
1.kernel:3.10.215
docker command :docker run -d -it -e MYSQL_ROOT_PASSWORD="lee" -mysql:5.7
then ,I got the error.
2.kernel:3.10.215
docker command :docker run -d -it --net=host -e MYSQL_ROOT_PASSWORD="lee" -mysql:5.7
then ,I got the error.
3.kernel 3.10.0.927
docker command :docker run -d -it -e MYSQL_ROOT_PASSWORD="lee" -mysql:5.7
then ,I got the error
4.kernel:3.10.0.927
docker command:docker run -d -it --net=host -e MYSQL_ROOT_PASSWORD="lee" -mysql:5.7
then ,It is OK.
5.kernel:4.20
docker command:docker run -d -it -e MYSQL_ROOT_PASSWORD="lee" -mysql:5.7
then ,I gotthe error.
6.kernel:4.20
docker command:docker run -d -it --net=host -e MYSQL_ROOT_PASSWORD="lee" -mysql:5.7
then,It is ok.

@thaJeztah
Copy link
Member

@leeningli so in each case, you start two MySQL containers; one with its own networking namespace, and one with --net=host (so using the host's networking namespace) correct?

Is there anything in the system- or daemon logs? (also might want to check audit logs to see if SELinux is involved)

@YumeMichi
Copy link

Linux localhost.localdomain 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

So how can I downgrade my docker-ce on centos 7? My production server cannot be restarted.

@thaJeztah
Copy link
Member

Let me close this ticket for now, as it looks like it went stale.

@thaJeztah thaJeztah closed this as not planned Won't fix, can't repro, duplicate, stale Jul 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants