Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Units created with "podman generate systemd --new" fail due missing podman state cleanup before start #7157

Closed
yangm97 opened this issue Jul 30, 2020 · 9 comments · Fixed by #7571
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@yangm97
Copy link
Contributor

yangm97 commented Jul 30, 2020

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

There's an issue with "undead" containers created with --rm from withn a shell script launched by a systemd unit which clog up state, and possibly corrupt state when too many of those are present.
They show up on ps -a as Created.

container prune on 1.6.x (plus armbian stretch) doesn't clean them, on 2.0.x (plus armbian buster) it almost always cleans all of those "undead" containers, except for when it fails with errors such as

ERRO[0009] error unmounting /var/lib/containers/storage/overlay/bla/merged: invalid argument

followed by

Error: error creating overlay mount to /var/lib/containers/storage/overlay/bla/merged: no such file or directory

(btw isn't it weird that a prune operation is trying to mount stuff?)

Steps to reproduce the issue:

I couldn't narrow down where this is coming, it's one of those issues which happen "on its own" and "when it feels like doing so" 🙃, so the reproduction I could come up with was to mimick my exact environment using the ansible playbook and scripts from this gist:

https://gist.github.com/yangm97/573b86a8b82181723aea96f70e690a98

This play will:

  • Create the container wrapper script and symlinks
  • Generate and persist a yggdrasil config from a j2 template
  • Create a systemd unit which consumes the symlink and config above

Having that set up:

  1. Try to abuse the host? Forced shutdowns maybe?

Describe the results you received:

$ sudo podman ps -a
CONTAINER ID  IMAGE                                                         COMMAND               CREATED       STATUS          PORTS   NAMES
019b650b5b0c  docker.io/yangm97/yggdrasil-go:latest                         -useconffile /etc...  6 weeks ago   ...

[truncated output]

1631fe0bddc3  docker.io/yangm97/yggdrasil-go:latest                         -useconffile /etc...  2 weeks ago   Up 2 weeks ago          yggdrasil.1594606637
20c48c933470  docker.io/yangm97/yggdrasil-go:latest                         -useconffile /etc...  3 months ago  

[truncated output]

Describe the results you expected:

Containers created with --rm should always be removed.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Version:      2.0.2
API Version:  1
Go Version:   go1.14.4
Built:        Thu Jan  1 00:00:00 1970
OS/Arch:      linux/arm

Output of podman info --debug:

host:
  arch: arm
  buildahVersion: 1.15.0
  cgroupVersion: v1
  conmon:
    package: 'conmon: /usr/libexec/podman/conmon'
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.18, commit: '
  cpus: 4
  distribution:
    distribution: debian
    version: "10"
  eventLogger: file
  hostname: hubot
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.4.45-sunxi
  linkmode: dynamic
  memFree: 87912448
  memTotal: 516595712
  ociRuntime:
    name: runc
    package: 'cri-o-runc: /usr/lib/cri-o-runc/sbin/runc'
    path: /usr/lib/cri-o-runc/sbin/runc
    version: 'runc version spec: 1.0.1-dev'
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  rootless: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 1249280
  swapTotal: 258293760
  uptime: 422h 27m 16.73s (Approximately 17.58 days)
registries:
  search:
  - docker.io
  - quay.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 8
    paused: 0
    running: 8
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 13
  runRoot: /var/run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 1
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.14.4
  OsArch: linux/arm
  Version: 2.0.2

Package info (e.g. output of rpm -q podman or apt list podman):

Listing... Done
podman/unknown 2.0.3~1 amd64
podman/unknown 2.0.3~1 arm64
podman/unknown 2.0.3~1 armhf [upgradable from: 2.0.2~2]
podman/unknown 2.0.3~1 ppc64el

Additional environment details (AWS, VirtualBox, physical, etc.):

Orange pi zero running armbian:

Distributor ID:	Debian
Description:	Debian GNU/Linux 10 (buster)
Release:	10
Codename:	buster
Linux hostname 5.4.45-sunxi #20.05.3 SMP Wed Jun 10 12:09:20 CEST 2020 armv7l GNU/Linux
@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 30, 2020
@mheon
Copy link
Member

mheon commented Jul 30, 2020

Do your systemd units include KillMode=none? Otherwise, systemd will aggressively kill Podman's cleanup process, preventing the container from being unmounted or removed.

In general, we strongly recommend using podman generate systemd to make unit files, as it handles some of the "gotchas" like this.

@yangm97
Copy link
Contributor Author

yangm97 commented Aug 1, 2020

Do your systemd units include KillMode=none?

Nope

In general, we strongly recommend using podman generate systemd to make unit files, as it handles some of the "gotchas" like this.

Yeah I mean this was just a wrapper script made for docker which I rushed converting to podman after all.

Otherwise, systemd will aggressively kill Podman's cleanup process, preventing the container from being unmounted or removed.

But still, this implies there might be cases where even the podman generated unit files trigger this issue, such as sudden power loss, am I correct?

@mheon
Copy link
Member

mheon commented Aug 1, 2020 via email

@yangm97
Copy link
Contributor Author

yangm97 commented Aug 10, 2020

Guess I'm going to close this as misconfiguration from my side, but feel free to reopen in case you guys need it.

@yangm97 yangm97 closed this as completed Aug 10, 2020
@yangm97
Copy link
Contributor Author

yangm97 commented Aug 19, 2020

Yes - we try and avoid that with an explicit 'podman rm' on the container in an ExecStartPre

Doesn't seem like it, from

info.ExecStartPre = "/bin/rm -f {{.PIDFile}} {{.ContainerIDFile}}"

# podman generate systemd --new yggdrasil
# container-95539329dea5d3f91410d8bac94804be3042832c66e87489a2750af376577065.service
# autogenerated by Podman 2.0.2
# Wed Aug 19 14:58:13 UTC 2020

[Unit]
Description=Podman container-95539329dea5d3f91410d8bac94804be3042832c66e87489a2750af376577065.service
Documentation=man:podman-generate-systemd(1)
Wants=network.target
After=network-online.target

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=on-failure
ExecStartPre=/bin/rm -f %t/container-95539329dea5d3f91410d8bac94804be3042832c66e87489a2750af376577065.pid %t/container-95539329dea5d3f91410d8bac94804be3042832c66e87489a2750af376577065.ctr-id
ExecStart=/usr/bin/podman run --conmon-pidfile %t/container-95539329dea5d3f91410d8bac94804be3042832c66e87489a2750af376577065.pid --cidfile %t/container-95539329dea5d3f91410d8bac94804be3042832c66e87489a2750af376577065.ctr-id --cgroups=no-conmon --conmon-pidfile /run/yggdrasil.pid --cidfile /run/yggdrasil.ctr-id --cgroups=no-conmon -d --rm --device=/dev/net/tun --net=host --cap-add=NET_ADMIN --name=yggdrasil --interactive -v /etc:/etc -v /var/run:/var/run yangm97/yggdrasil-go:latest -useconffile /etc/yggdrasil.conf
ExecStop=/usr/bin/podman stop --ignore --cidfile %t/container-95539329dea5d3f91410d8bac94804be3042832c66e87489a2750af376577065.ctr-id -t 10
ExecStopPost=/usr/bin/podman rm --ignore -f --cidfile %t/container-95539329dea5d3f91410d8bac94804be3042832c66e87489a2750af376577065.ctr-id
PIDFile=%t/container-95539329dea5d3f91410d8bac94804be3042832c66e87489a2750af376577065.pid
KillMode=none
Type=forking

[Install]
WantedBy=multi-user.target default.target

Using the generated systemd unit still led me to a situation where the container refuses to start because of a name conflict.

Aug 19 15:21:40 orangepi podman[4597]: Error: error creating container storage: the container name "yggdrasil" is already in use by "95539329dea5d3f91410d8bac94804be3042832c66e87489a2750af376577065". You have to remove that container to be able to reuse that name.: that name is already in use

So it looks like the ExecStartPre from template only attempts to get rid of the pid and cid files (which, since they're stored at /run might be gone after a reboot), leaving the dead container on Podman state as is.

@yangm97 yangm97 changed the title Containers created with "podman --rm" from a custom systemd unit don't seem to be properly removed Units created with "podman generate systemd --new" fail due missing podman state cleanup before start Aug 19, 2020
@yangm97 yangm97 reopened this Aug 19, 2020
@vrothberg
Copy link
Member

Yes - we try and avoid that with an explicit 'podman rm' on the container in an ExecStartPre

We are cleaning up the files in ExecStartPre but not the containers.

However, there's an explicit clean-up for named containers where we're setting the --replace flag automatically. This way we enforce that a container with the same name will be removed.

@yangm97, did you remove the --replace from the generated unit? The replace feature has been added by commit 6118ab4 back in June and is available since Podman 2.0.

@yangm97
Copy link
Contributor Author

yangm97 commented Aug 20, 2020

@yangm97, did you remove the --replace from the generated unit? The replace feature has been added by commit 6118ab4 back in June and is available since Podman 2.0.

Nope, just ran the generate command and pasted as is. Looks like --replace defaults to false:

hasReplaceParam := false

@vrothberg
Copy link
Member

Here we're initializing the variable which is okay. But I could now track it down. We're not catching the --name=foo case but only --name foo. I'll prepare a fix in a bit.

@vrothberg
Copy link
Member

Opened #7571

vrothberg added a commit to vrothberg/libpod that referenced this issue Sep 9, 2020
The systemd generator looks for certain flags in the containers' create
commands to determine which flags need to be added.  In case of named
containers, the generator adds the `--replace` flag to prevent name
conflicts at container creation.  Fix the generator to not only cover
the `--name foo` syntax but also the `--name=foo` one.

Fixes: containers#7157
Signed-off-by: Valentin Rothberg <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants