SIGCONT is not terminating kube-dns container as it should #276

sboeuf · 2018-04-30T16:10:20Z

We need to investigate why the SIGCONT signal sent does not terminate the container process when the kube-dns pod is torn down. CRI-O expects this signal to terminate the container, meaning we're missing something in our codebase to handle such a corner case.

The text was updated successfully, but these errors were encountered:

sboeuf · 2018-05-02T22:07:36Z

I have spent some time on this and I have narrowed down the expectations from kube-dns pod.
When the pod is created, one of his containers is created with the StopSignal being SIGCONT. This signal is not supposed to kill the container, and this is done on purpose. This actually makes sure the container will still be around until the timeout (also called grace period) is reached.

The timeout is set through the call to StopContainer(), and here are the parameters for this API.

Now, when our cluster is running and that we call into kubeadm reset eventually, kubelet calls into StopContainer() for every container of the kube-dns pod, and because the underlying CRI implementation (CRI-O in our case) still see the container around after the timeout expired, it will send a SIGKILL signal. This is expected by the CRI specification and this whole behavior is completely as it should be.

And we should not have experienced any issue if we had the patch cri-o/cri-o#1419 (already merged into CRI-O master, but not in CRI-O 1.9), because CRI-O would have issued the SIGKILL through the OCI runtime, instead of killing the shim process directly.

This confirms the fact it is still good to handle a SIGKILL from the shim because we might not anticipate all the cases.

But #275 definitely solved the issue and backporting cri-o/cri-o#1419 to CRI-O 1.9 could be a pretty good idea.

sboeuf · 2018-05-02T22:07:56Z

/cc @egernst @grahamwhaley

sboeuf · 2018-05-02T22:08:15Z

I think we can close this now, please feel free !

grahamwhaley · 2018-05-03T10:29:33Z

Nice digging @sboeuf - thanks for the update. Yes, I'm happy that we/you know know what happens why, and how we handle it :-) - closing!

Fixes: kata-containers#276 Signed-off-by: Nitesh Konkar [email protected]

sboeuf mentioned this issue Apr 30, 2018

virtcontainers: Properly remove the container when shim gets killed #275

Merged

grahamwhaley closed this as completed May 3, 2018

zklei pushed a commit to zklei/runtime that referenced this issue Jun 13, 2019

travis: Enable travis ci for ppc64le

4ef4971

Fixes: kata-containers#276 Signed-off-by: Nitesh Konkar [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIGCONT is not terminating kube-dns container as it should #276

SIGCONT is not terminating kube-dns container as it should #276

sboeuf commented Apr 30, 2018

sboeuf commented May 2, 2018

sboeuf commented May 2, 2018

sboeuf commented May 2, 2018

grahamwhaley commented May 3, 2018

SIGCONT is not terminating kube-dns container as it should #276

SIGCONT is not terminating kube-dns container as it should #276

Comments

sboeuf commented Apr 30, 2018

sboeuf commented May 2, 2018

sboeuf commented May 2, 2018

sboeuf commented May 2, 2018

grahamwhaley commented May 3, 2018