loading seccomp filter: invalid argument #2865

areed · 2021-03-19T20:20:23Z

We're seeing machines with several runc init processes blocked writing the same message to stderr:

$ ps -ef | grep '[r]unc init' | awk '{ print $2 }' | xargs -I'{}' sudo strace -p '{}' -s256 -e write
strace: Process 32329 attached
write(2, "standard_init_linux.go:207: init seccomp caused: error loading seccomp filter into kernel: loading seccomp filter: invalid argument\n", 132) = 132
+++ exited with 1 +++
strace: Process 32453 attached
write(2, "standard_init_linux.go:207: init seccomp caused: error loading seccomp filter into kernel: loading seccomp filter: invalid argument\n", 132) = 132
+++ exited with 1 +++
strace: Process 32559 attached
write(2, "standard_init_linux.go:207: init seccomp caused: error loading seccomp filter into kernel: loading seccomp filter: invalid argument\n", 132) = 132
+++ exited with 1 +++

This appears to cause a chain reaction on Kubernetes nodes where a lock acquired during docker start for the pause container of a Pod blocks PLEG and the node flaps between Ready and NotReady.

The text was updated successfully, but these errors were encountered:

cpuguy83 · 2021-03-29T22:22:35Z

Seeing this too with containerd. I can repro easily in an AKS cluster with rc93. rc92 works just fine.

Calls look like this:

[pid 60015] seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_LOG, NULL) = -1 EFAULT (Bad address)
[pid 60018] <... nanosleep resumed> NULL) = 0
[pid 60015] seccomp(SECCOMP_GET_ACTION_AVAIL, 0, [SECCOMP_RET_LOG] <unfinished ...>
[pid 60018] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid 60015] <... seccomp resumed> )     = 0
[pid 60015] prctl(PR_SET_SECCOMP, ...
[pid 60018] <... nanosleep resumed> NULL) = 0
[pid 60018] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid 60015] <... prctl resumed> )       = -1 EINVAL (Invalid argument)
[pid 60018] <... nanosleep resumed> NULL) = 0
[pid 60018] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid 60015] write(2, "standard_init_linux.go:207: init seccomp caused: error loading seccomp filter into kernel: loading seccomp filter: invalid argum"..., 132) = 132

I truncated the PR_SET_SECCOMP b/c it is super long.

cpuguy83 · 2021-03-29T22:32:17Z

Worth noting, runc init is blocked for me until I attempt to strace it, then it exits with that error.
I've also tried modifying it to dump a stack trace to disk for me with SIGUSR1 and that also makes it exit, and no stack dump.

cpuguy83 · 2021-03-29T23:36:39Z

Bisected to 7a8d716

cpuguy83 · 2021-03-29T23:41:23Z

/cc @cyphar

cyphar · 2021-03-30T05:52:28Z

I believe #2871 fixes this.

There is a bug with rc93 that may be causing CI instability (opencontainers/runc#2865). Downgrading to rc92 to see if we get better runs. Signed-off-by: Brian Goff <[email protected]>

wu0407 · 2021-03-31T13:35:23Z

maybe same issue containerd/containerd#5280

oppianmatt · 2021-04-01T17:15:14Z

we get this error pretty consistently on 2 machines running ubuntu 16.04

downgrading containerd.io to 1.4.3 doesn't fix it as it seems that still uses runc93 and not 92

it was plaguing us for weeks, sometimes locking the system up requiring a reboot was the only fix, and deploys were getting stuck

we found in the end, after reading this thread, that to unstick them is to run:

ps aux |grep "runc init"| grep -v grep | awk "{print\$2}" | xargs -r -n 1 -t strace -p

that will strace each one until they exit, and then the system works

kolyshkin · 2021-04-02T02:36:53Z

This might be fixed by #2871, which was just merged

@oppianmatt @wu0407 @areed @cpuguy83 can you please test the runc tip and report back if the bug is fixed?

oppianmatt · 2021-04-02T12:56:12Z

we were only experiencing this on prod, until today that is. I tried removing an image from our staging server and it got stuck. So using it as an opportunity to test.

Downloaded runc 92 and replaced the binary on the system and restarted containerd service and docker service.

When restarting docker in this state we see huge stackdumps in the log all about mutex locks

can see some were quite stuck for a while

so much dumping going into syslog that the logs are 5 minutes behind

after that though can remove the stuck containers

of course won't know for sure until I get another stuck instance, if I do I will report right back, so take no news as good news

cpuguy83 · 2021-04-02T19:45:15Z

@kolyshkin Verified both by cherry-picking that commit onto rc93 and testing HEAD directly.

cyphar · 2021-04-03T04:39:07Z

Okay, so it looks like it's fixed then. Please comment if this still isn't fixed after testing on tip. We will do a new release soon.

Signed-off-by: Sebastian Hasler <[email protected]>

Signed-off-by: Sebastian Hasler <[email protected]> Co-authored-by: Sebastian Hasler <[email protected]>

This reverts commit 87b0c75.

This reverts commit 87b0c75. Co-authored-by: Sebastian Hasler <[email protected]>

AkihiroSuda added area/rootless area/seccomp and removed area/rootless labels Mar 21, 2021

danail-branekov mentioned this issue Mar 24, 2021

Ensure the seccomp pipe is being read while exporting bpf (regression in rc93) #2871

Merged

Oats87 mentioned this issue Mar 24, 2021

Container startup with option no-new-privileges get stuck in containerd version 1.4.4 containerd/containerd#5261

Closed

paulgmiller mentioned this issue Mar 30, 2021

upgrading from 1.4.3->1.4.4 causes kubernetes PLEG issues containerd/containerd#5274

Closed

cpuguy83 mentioned this issue Mar 30, 2021

Downgrade CI to use runc rc92 containerd/containerd#5289

Closed

cpuguy83 mentioned this issue Mar 31, 2021

runc init blocked opening exec.fifo #2828

Closed

kolyshkin added this to the 1.0.0-rc94 milestone Apr 1, 2021

kolyshkin mentioned this issue Apr 2, 2021

rc94 discussion (mid-April 2021?) #2790

Closed

cyphar closed this as completed Apr 3, 2021

This was referenced Apr 5, 2021

create container stuck by run init containerd/containerd#5280

Closed

How should containerd react when the local disk is slow? containerd/containerd#5325

Closed

eriksywu mentioned this issue Apr 16, 2021

install specific version of runc for all SKUs Azure/AgentBaker#754

Merged

haslersn added a commit to stuvusIT/ansible_containerd that referenced this issue Apr 16, 2021

fix(runc): Workaround for opencontainers/runc#2865

da14e70

Signed-off-by: Sebastian Hasler <[email protected]>

haslersn added a commit to stuvusIT/ansible_containerd that referenced this issue Apr 16, 2021

fix(runc): Workaround for opencontainers/runc#2865

275c4ff

Signed-off-by: Sebastian Hasler <[email protected]>

fabianhick pushed a commit to stuvusIT/ansible_containerd that referenced this issue Apr 16, 2021

fix(runc): Workaround for opencontainers/runc#2865 (#3)

87b0c75

Signed-off-by: Sebastian Hasler <[email protected]> Co-authored-by: Sebastian Hasler <[email protected]>

thaJeztah mentioned this issue Apr 22, 2021

Update docker 20.10.6 linuxkit/linuxkit#3554

Merged

jackfrancis mentioned this issue Apr 29, 2021

chore: update CRI builds Azure/aks-engine#4394

Merged

8 tasks

areed mentioned this issue Apr 30, 2021

CRI fails to invoke CNI plugin to teardown network when RunPodSandbox times out containerd/containerd#5438

Closed

ionutleca mentioned this issue May 11, 2021

Pods stuck in ContainerCreating state rancher/rke2#973

Closed

haslersn added a commit to stuvusIT/ansible_containerd that referenced this issue May 31, 2021

Revert "fix(runc): Workaround for opencontainers/runc#2865 (#3)"

b0e180a

This reverts commit 87b0c75.

haslersn added a commit to stuvusIT/ansible_containerd that referenced this issue May 31, 2021

Revert "fix(runc): Workaround for opencontainers/runc#2865 (#3)" (#4)

dcb14af

This reverts commit 87b0c75. Co-authored-by: Sebastian Hasler <[email protected]>

oldthreefeng mentioned this issue Jun 3, 2021

default containerd version is to high can cause PLEG problem， why not changed it? kubesphere/kubekey#551

Closed

This was referenced Jun 4, 2021

Fix cleanup context of teardownPodNetwork containerd/containerd#5569

Merged

Pods stuck in RunContainerError status with error: no IP addresses available antrea-io/antrea#2244

Closed

matthewbauer mentioned this issue Sep 16, 2021

Install fails for aarch64 in docker NixOS/nix#5258

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loading seccomp filter: invalid argument #2865

loading seccomp filter: invalid argument #2865

areed commented Mar 19, 2021

cpuguy83 commented Mar 29, 2021

cpuguy83 commented Mar 29, 2021

cpuguy83 commented Mar 29, 2021

cpuguy83 commented Mar 29, 2021

cyphar commented Mar 30, 2021

wu0407 commented Mar 31, 2021

oppianmatt commented Apr 1, 2021 •

edited

Loading

kolyshkin commented Apr 2, 2021

oppianmatt commented Apr 2, 2021

cpuguy83 commented Apr 2, 2021

cyphar commented Apr 3, 2021

loading seccomp filter: invalid argument #2865

loading seccomp filter: invalid argument #2865

Comments

areed commented Mar 19, 2021

cpuguy83 commented Mar 29, 2021

cpuguy83 commented Mar 29, 2021

cpuguy83 commented Mar 29, 2021

cpuguy83 commented Mar 29, 2021

cyphar commented Mar 30, 2021

wu0407 commented Mar 31, 2021

oppianmatt commented Apr 1, 2021 • edited Loading

kolyshkin commented Apr 2, 2021

oppianmatt commented Apr 2, 2021

cpuguy83 commented Apr 2, 2021

cyphar commented Apr 3, 2021

oppianmatt commented Apr 1, 2021 •

edited

Loading