Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

SIGCONT is not terminating kube-dns container as it should #276

Closed
sboeuf opened this issue Apr 30, 2018 · 4 comments
Closed

SIGCONT is not terminating kube-dns container as it should #276

sboeuf opened this issue Apr 30, 2018 · 4 comments

Comments

@sboeuf
Copy link

sboeuf commented Apr 30, 2018

We need to investigate why the SIGCONT signal sent does not terminate the container process when the kube-dns pod is torn down. CRI-O expects this signal to terminate the container, meaning we're missing something in our codebase to handle such a corner case.

@sboeuf
Copy link
Author

sboeuf commented May 2, 2018

I have spent some time on this and I have narrowed down the expectations from kube-dns pod.
When the pod is created, one of his containers is created with the StopSignal being SIGCONT. This signal is not supposed to kill the container, and this is done on purpose. This actually makes sure the container will still be around until the timeout (also called grace period) is reached.

The timeout is set through the call to StopContainer(), and here are the parameters for this API.

Now, when our cluster is running and that we call into kubeadm reset eventually, kubelet calls into StopContainer() for every container of the kube-dns pod, and because the underlying CRI implementation (CRI-O in our case) still see the container around after the timeout expired, it will send a SIGKILL signal. This is expected by the CRI specification and this whole behavior is completely as it should be.

And we should not have experienced any issue if we had the patch cri-o/cri-o#1419 (already merged into CRI-O master, but not in CRI-O 1.9), because CRI-O would have issued the SIGKILL through the OCI runtime, instead of killing the shim process directly.

This confirms the fact it is still good to handle a SIGKILL from the shim because we might not anticipate all the cases.

But #275 definitely solved the issue and backporting cri-o/cri-o#1419 to CRI-O 1.9 could be a pretty good idea.

@sboeuf
Copy link
Author

sboeuf commented May 2, 2018

/cc @egernst @grahamwhaley

@sboeuf
Copy link
Author

sboeuf commented May 2, 2018

I think we can close this now, please feel free !

@grahamwhaley
Copy link
Contributor

Nice digging @sboeuf - thanks for the update. Yes, I'm happy that we/you know know what happens why, and how we handle it :-) - closing!

zklei pushed a commit to zklei/runtime that referenced this issue Jun 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants