Implement workaround to clean up leaking cgroups #570

knisbet · 2020-03-11T03:25:25Z

This change implements a cleaner, that scans for cgroups created by
systemd-run --scope that do not have any pids assigned, indicating
that the cgroup is unused and should be cleaned up. On some systems
either due to systemd or the kernel, the scope is not being cleaned
up when the pids within the scope have completed execution, leading
to an eventual memory leak.

Kubernetes uses systemd-run --scope when creating mount points,
that may require drivers to be loaded/running in a separate context
from kubelet, which allows the above leak to occur.

kubernetes/kubernetes#70324
kubernetes/kubernetes#64137

Updates gravitational/gravity#1219

This change implements a cleaner, that scans for cgroups created by systemd-run --scope that do not have any pids assigned, indicating that the cgroup is unused and should be cleaned up. On some systems either due to systemd or the kernel, the scope is not being cleaned up when the pids within the scope have completed execution, leading to an eventual memory leak. Kubernetes uses systemd-run --scope when creating mount points, that may require drivers to be loaded/running in a separate context from kubelet, which allows the above leak to occur. kubernetes/kubernetes#70324 kubernetes/kubernetes#64137 gravitational/gravity#1219

tool/planet/agent.go

tool/planet/cgroup.go

r0mant · 2020-03-11T16:20:42Z

tool/planet/cgroup.go

+
+	var paths []string
+
+	baseTime := time.Now().Add(-time.Minute)


Do you think a 1 minute interval is enough? Maybe, to be on a safe side, make it like an hour, or at least 10 minutes?

I was thinking the likeliness of a race here is something like a few ns, maybe a few ms if the system is busy. This should be systemd creating a scope, which still start empty, and then placing the launched process into the particular cgroup. So the window is extremely tiny, and a minute is already overkill. If the logic here is incorrect and this stops scopes it shouldn't, I don't see much difference between stopping a scope that is 1 minute old and 1 hour old to cause problems, except that 1 minute old is hopefully a lot more apparent.

I suppose the potential for a false positive, is if a process inside planet specifically creates a scope, and then doesn't use it for a few minutes, that some other process will have manually come in and cleaned it up.

It would be possible to get the scope object from systemd first, and match specifically against Kubernetes mounts, but I was trying to keep the implementation relatively simple.

tool/planet/cgroup.go

* Implement workaround to clean up leaking cgroups This change implements a cleaner, that scans for cgroups created by systemd-run --scope that do not have any pids assigned, indicating that the cgroup is unused and should be cleaned up. On some systems either due to systemd or the kernel, the scope is not being cleaned up when the pids within the scope have completed execution, leading to an eventual memory leak. Kubernetes uses systemd-run --scope when creating mount points, that may require drivers to be loaded/running in a separate context from kubelet, which allows the above leak to occur. kubernetes/kubernetes#70324 kubernetes/kubernetes#64137 gravitational/gravity#1219 * change logging level for cgroup cleanup * address review feedback * address review feedback (cherry picked from commit 00ed8e6)

knisbet requested review from a team, r0mant and a-palchikov March 11, 2020 03:25

change logging level for cgroup cleanup

7694b97

r0mant reviewed Mar 11, 2020

View reviewed changes

address review feedback

9a00c92

a-palchikov approved these changes Mar 11, 2020

View reviewed changes

tool/planet/cgroup.go Outdated Show resolved Hide resolved

tool/planet/cgroup.go Outdated Show resolved Hide resolved

address review feedback

2dec327

knisbet merged commit 00ed8e6 into master Mar 13, 2020

knisbet deleted the kevin/master/1219-fix-dangling-cgroups branch March 13, 2020 15:23

knisbet pushed a commit that referenced this pull request Mar 17, 2020

Backport #570

14409f2

knisbet mentioned this pull request Mar 17, 2020

Backport #570 #578

Merged

knisbet pushed a commit that referenced this pull request Mar 17, 2020

Backport #570 (#578)

36bdd59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement workaround to clean up leaking cgroups #570

Implement workaround to clean up leaking cgroups #570

knisbet commented Mar 11, 2020

r0mant Mar 11, 2020

knisbet Mar 11, 2020

Implement workaround to clean up leaking cgroups #570

Implement workaround to clean up leaking cgroups #570

Conversation

knisbet commented Mar 11, 2020

r0mant Mar 11, 2020

Choose a reason for hiding this comment

knisbet Mar 11, 2020

Choose a reason for hiding this comment