Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pod can not be removed because of extra containers #11713

Closed
abitrolly opened this issue Sep 23, 2021 · 27 comments · Fixed by #11851
Closed

pod can not be removed because of extra containers #11713

abitrolly opened this issue Sep 23, 2021 · 27 comments · Fixed by #11851
Assignees
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@abitrolly
Copy link
Contributor

/kind bug

Description

podman pod creates an extra container from k8s.gcr.io/pause:3.5 that applications are not aware of and can not clean up properly. This happens with podman 3.3.1 and podman-compose.

Steps to reproduce the issue:

git clone https://github.com/pypa/warehouse
cd warehouse
podman-compose up
podman-compose down

Describe the results you received:

...
podman rm warehouse_notgithub_1
3ddc81c992e5be790200e41f0e38c53e5a05f339e4a5882082eed5129c664bf6
0
podman pod rm warehouse
Error: pod c806ecbbc4787bd909aca45540bfaf356a9b308cd2f974ce2cca586333151c68 has containers that are not ready to be removed: cannot remove container e7615dca95cd50548f968dbca974bcdbe9e000e1a317a53b3d70ae7aa31911a5 as it is running - running or paused containers cannot be removed without force: container state improper
125
$ podman ps | grep  e7615dca
e7615dca95cd  k8s.gcr.io/pause:3.5              4 days ago  Up 2 hours ago  0.0.0.0:1080->80/tcp, 0.0.0.0:4576->4576/tcp, 0.0.0.0:5433->5432/tcp, 0.0.0.0:8080->8000/tcp, 0.0.0.0:8964->8000/tcp, 0.0.0.0:8125->8125/udp, 0.0.0.0:8200->8200/tcp, 0.0.0.0:9000-9001->9000-9001/tcp  c806ecbbc478-infra

Describe the results you expected:

No errors when running podman pod rm.

Output of podman version:

Version:      3.3.1
API Version:  3.3.1
Go Version:   go1.16.6
Built:        Mon Aug 30 23:46:36 2021
OS/Arch:      linux/amd64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)

Yes

@Luap99
Copy link
Member

Luap99 commented Sep 23, 2021

You need to stop the infra container. Either use podman pod stop ... before or use podman pod rm -f ...

@Luap99 Luap99 closed this as completed Sep 23, 2021
@abitrolly
Copy link
Contributor Author

@Luap99 the question is why there are extra containers in the first place and why people need to care about these internal details?

@baude
Copy link
Member

baude commented Sep 23, 2021

@abitrolly our pod implementation currently uses an infra container (as was standard in the kube world). The infra container basically owns the network and other namespaces. in the future, this is something we would like to eliminate but currently is not a priority.

@abitrolly
Copy link
Contributor Author

abitrolly commented Sep 23, 2021

@baude does "was" means that kubernetes doesn't use any extra container anymore? And doesn't expose them to users, right?

If the pod is empty, why the podman do not stop its infra container itself?

@mheon
Copy link
Member

mheon commented Sep 23, 2021

Allowing removal of pods where 1 container is running (and it's the infra container) seems reasonable

@rhatdan
Copy link
Member

rhatdan commented Sep 23, 2021

I agree, we should automatically stop the infra container with a -t 0 field.

@baude
Copy link
Member

baude commented Sep 23, 2021

@abitrolly you asked why it was this way in podman, i answered. it seems like are in agreement to fix the behavior.

@abitrolly
Copy link
Contributor Author

@baude yep. It would also be nice to keep this ticket open until it is fixed.

@mheon mheon reopened this Sep 24, 2021
@cdoern
Copy link
Contributor

cdoern commented Sep 29, 2021

Is this only in conjunction with podman-compose? Or is this affected by the image infra is using? running a podman pod create podman pod start and then podman pod rm seems to work for me when running normally. If this is just a bug where infra needs to be stopped upon removal when not using the pause image, I can take this one.

@cdoern cdoern self-assigned this Sep 29, 2021
@cdoern
Copy link
Contributor

cdoern commented Sep 30, 2021

another question @mheon, does this go through libpod/runtime_pod_linux.go:171? or is podman-compose a separate path? I was looking at the podman-compose repo, trying to decipher the python but just got down to here which I am unsure whether that just leads back to podman.

@mheon
Copy link
Member

mheon commented Sep 30, 2021

It's directly invoking the CLI, so that's just a podman pod rm (no --force) command.

@cdoern
Copy link
Contributor

cdoern commented Sep 30, 2021

hmmm interesting because libpod/runtime_pod_linux.go:171 already has the code to retrieve and remove infra if it is the only container in the pod. I guess this could have something to do with removal of the previous container from the pod? If that isn't removed properly then numContainers > 1 still

@mheon
Copy link
Member

mheon commented Sep 30, 2021

Yeah, that sounds right - I imagine we must have been handling this already.

I think at this point we need to try and reproduce with podman-compose - the actual removal of the pod may be typical, but it could be doing something unusual before the pod is removed that is confusing us.

@cdoern
Copy link
Contributor

cdoern commented Oct 1, 2021

update here so I can keep track of my though process: my only idea is that since the podman-compose down process stops all containers before removing the pod, we should try to get infra in the compose.containers entity so it is stopped before we even run into this issue. Not sure how to go about getting this though...

cdoern added a commit to cdoern/podman-compose that referenced this issue Oct 1, 2021
Pods were sometimes failing to remove if infra was running.
Added a pod stop command and --force flag to pod rm within
compose_down

resolves containers/podman#11713

Signed-off-by: cdoern <[email protected]>
@abitrolly
Copy link
Contributor Author

@cdoern that doesn't fix the root cause, so I don't think containers/podman-compose#340 resolves this behavior,

@cdoern
Copy link
Contributor

cdoern commented Oct 1, 2021

@cdoern that doesn't fix the root cause, so I don't think containers/podman-compose#340 resolves this behavior,

What do you mean by root issue @abitrolly I am pretty sure a --force will get rid of infra if the pod stop didn't stop it.

@abitrolly
Copy link
Contributor Author

abitrolly commented Oct 2, 2021

The root issue is that podman produces extra container that users are not aware of. --force or removing it explicitly in podman-compose is a workaround. The proper fix is to remove the extra container in podman code when pod rm is issued and it is the only container there.

@abitrolly
Copy link
Contributor Author

This is the place where the failure happens.

// Ensure state appropriate for removal
if err := ctr.checkReadyForRemoval(); err != nil {
return errors.Wrapf(err, "pod %s has containers that are not ready to be removed", p.ID())
}

@abitrolly
Copy link
Contributor Author

But the code that explicitly checks for this condition is above here.

// If the only container in the pod is the pause container, remove the pod and container unconditionally.
pauseCtrID := p.state.InfraContainerID
if numCtrs == 1 && ctrs[0].ID() == pauseCtrID {
removeCtrs = true
force = true
}

Maybe there are some race conditions. Need a verbose mode to show.

@abitrolly
Copy link
Contributor Author

abitrolly commented Oct 2, 2021

I used more lightweight project to track the issue. So the error happens when pod contains exited containers from one time runs. That's probably why numCtrs == 1 check fails.

     "NumContainers": 3,
     "Containers": [
          {
               "Id": "2bf16ddb526589eca084bf87925cf2c718591005c3d51bf9c302f129ad57fb0f",
               "Name": "112c0a12dccb-infra",
               "State": "running"
          },
          {
               "Id": "66732414d5001adb19eb1bed40f190442af6ae888b1aee2f69989c2a8db8d795",
               "Name": "cheatsh_app_tmp24211",
               "State": "exited"
          },
          {
               "Id": "9840deb080b3b93e22380b533c8613d017aa40f4498483a966e23c168fde3cfb",
               "Name": "cheatsh_app_tmp52438",
               "State": "exited"
          }
     ]

I occasionally use podman-compose run --entrypoint=bash app to troubleshoot things in containers.

@mheon
Copy link
Member

mheon commented Oct 2, 2021

OK, that makes more sense. I think that can be said to be working as designed; the pod is not empty, so we're refusing to remove without --force.

@abitrolly
Copy link
Contributor Author

If the pod is not empty, then the error should list all containers that are not empty, but it lists only internal container.

@mheon
Copy link
Member

mheon commented Oct 3, 2021

Sure - looks like the error messages are out of order.

@abitrolly
Copy link
Contributor Author

Speaking of errors it would be more user friendly to report container names along with their hashes to understand what is going on.

@cdoern
Copy link
Contributor

cdoern commented Oct 4, 2021

The one shot containers are not removed first using container rm? Seems like that is the expected outcome, right? I say we should either print a better err msg and leave it as is, or make sure that infra actually gets removed via --force

@rhatdan
Copy link
Member

rhatdan commented Oct 4, 2021

Yes let's improve the error message, and ensure that the remove --force.

@cdoern
Copy link
Contributor

cdoern commented Oct 4, 2021

so my compose PR, fixes the --force aspect of this right? I just put together the new error message.

cdoern added a commit to cdoern/podman that referenced this issue Oct 18, 2021
Made changes so that if the pod contains all exited containers and only infra is running, remove the pod.

resolves containers#11713

Signed-off-by: cdoern <[email protected]>
mheon pushed a commit to mheon/libpod that referenced this issue Nov 12, 2021
Made changes so that if the pod contains all exited containers and only infra is running, remove the pod.

resolves containers#11713

Signed-off-by: cdoern <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
6 participants