Skip to content
This repository has been archived by the owner on Jul 1, 2023. It is now read-only.

Implement workaround to clean up leaking cgroups #570

Merged
merged 4 commits into from
Mar 13, 2020

Conversation

knisbet
Copy link
Contributor

@knisbet knisbet commented Mar 11, 2020

This change implements a cleaner, that scans for cgroups created by
systemd-run --scope that do not have any pids assigned, indicating
that the cgroup is unused and should be cleaned up. On some systems
either due to systemd or the kernel, the scope is not being cleaned
up when the pids within the scope have completed execution, leading
to an eventual memory leak.

Kubernetes uses systemd-run --scope when creating mount points,
that may require drivers to be loaded/running in a separate context
from kubelet, which allows the above leak to occur.

kubernetes/kubernetes#70324
kubernetes/kubernetes#64137

Updates gravitational/gravity#1219

This change implements a cleaner, that scans for cgroups created by
systemd-run --scope that do not have any pids assigned, indicating
that the cgroup is unused and should be cleaned up. On some systems
either due to systemd or the kernel, the scope is not being cleaned
up when the pids within the scope have completed execution, leading
to an eventual memory leak.

Kubernetes uses systemd-run --scope when creating mount points,
that may require drivers to be loaded/running in a separate context
from kubelet, which allows the above leak to occur.

kubernetes/kubernetes#70324
kubernetes/kubernetes#64137
gravitational/gravity#1219
@knisbet knisbet requested review from a team, r0mant and a-palchikov March 11, 2020 03:25
tool/planet/agent.go Outdated Show resolved Hide resolved
tool/planet/cgroup.go Show resolved Hide resolved
tool/planet/cgroup.go Outdated Show resolved Hide resolved

var paths []string

baseTime := time.Now().Add(-time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think a 1 minute interval is enough? Maybe, to be on a safe side, make it like an hour, or at least 10 minutes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the likeliness of a race here is something like a few ns, maybe a few ms if the system is busy. This should be systemd creating a scope, which still start empty, and then placing the launched process into the particular cgroup. So the window is extremely tiny, and a minute is already overkill. If the logic here is incorrect and this stops scopes it shouldn't, I don't see much difference between stopping a scope that is 1 minute old and 1 hour old to cause problems, except that 1 minute old is hopefully a lot more apparent.

I suppose the potential for a false positive, is if a process inside planet specifically creates a scope, and then doesn't use it for a few minutes, that some other process will have manually come in and cleaned it up.

It would be possible to get the scope object from systemd first, and match specifically against Kubernetes mounts, but I was trying to keep the implementation relatively simple.

tool/planet/cgroup.go Outdated Show resolved Hide resolved
tool/planet/cgroup.go Outdated Show resolved Hide resolved
tool/planet/cgroup.go Show resolved Hide resolved
tool/planet/cgroup.go Outdated Show resolved Hide resolved
tool/planet/cgroup.go Outdated Show resolved Hide resolved
tool/planet/cgroup.go Outdated Show resolved Hide resolved
@knisbet knisbet merged commit 00ed8e6 into master Mar 13, 2020
@knisbet knisbet deleted the kevin/master/1219-fix-dangling-cgroups branch March 13, 2020 15:23
knisbet pushed a commit that referenced this pull request Mar 17, 2020
* Implement workaround to clean up leaking cgroups

This change implements a cleaner, that scans for cgroups created by
systemd-run --scope that do not have any pids assigned, indicating
that the cgroup is unused and should be cleaned up. On some systems
either due to systemd or the kernel, the scope is not being cleaned
up when the pids within the scope have completed execution, leading
to an eventual memory leak.

Kubernetes uses systemd-run --scope when creating mount points,
that may require drivers to be loaded/running in a separate context
from kubelet, which allows the above leak to occur.

kubernetes/kubernetes#70324
kubernetes/kubernetes#64137
gravitational/gravity#1219

* change logging level for cgroup cleanup

* address review feedback

* address review feedback

(cherry picked from commit 00ed8e6)
knisbet pushed a commit that referenced this pull request Mar 17, 2020
* Implement workaround to clean up leaking cgroups

This change implements a cleaner, that scans for cgroups created by
systemd-run --scope that do not have any pids assigned, indicating
that the cgroup is unused and should be cleaned up. On some systems
either due to systemd or the kernel, the scope is not being cleaned
up when the pids within the scope have completed execution, leading
to an eventual memory leak.

Kubernetes uses systemd-run --scope when creating mount points,
that may require drivers to be loaded/running in a separate context
from kubelet, which allows the above leak to occur.

kubernetes/kubernetes#70324
kubernetes/kubernetes#64137
gravitational/gravity#1219

* change logging level for cgroup cleanup

* address review feedback

* address review feedback

(cherry picked from commit 00ed8e6)
knisbet pushed a commit that referenced this pull request Mar 17, 2020
@knisbet knisbet mentioned this pull request Mar 17, 2020
knisbet pushed a commit that referenced this pull request Mar 17, 2020
knisbet pushed a commit that referenced this pull request Mar 17, 2020
* Implement workaround to clean up leaking cgroups

This change implements a cleaner, that scans for cgroups created by
systemd-run --scope that do not have any pids assigned, indicating
that the cgroup is unused and should be cleaned up. On some systems
either due to systemd or the kernel, the scope is not being cleaned
up when the pids within the scope have completed execution, leading
to an eventual memory leak.

Kubernetes uses systemd-run --scope when creating mount points,
that may require drivers to be loaded/running in a separate context
from kubelet, which allows the above leak to occur.

kubernetes/kubernetes#70324
kubernetes/kubernetes#64137
gravitational/gravity#1219

* change logging level for cgroup cleanup

* address review feedback

* address review feedback

(cherry picked from commit 00ed8e6)
knisbet pushed a commit that referenced this pull request Mar 17, 2020
* Implement workaround to clean up leaking cgroups

This change implements a cleaner, that scans for cgroups created by
systemd-run --scope that do not have any pids assigned, indicating
that the cgroup is unused and should be cleaned up. On some systems
either due to systemd or the kernel, the scope is not being cleaned
up when the pids within the scope have completed execution, leading
to an eventual memory leak.

Kubernetes uses systemd-run --scope when creating mount points,
that may require drivers to be loaded/running in a separate context
from kubelet, which allows the above leak to occur.

kubernetes/kubernetes#70324
kubernetes/kubernetes#64137
gravitational/gravity#1219

* change logging level for cgroup cleanup

* address review feedback

* address review feedback

(cherry picked from commit 00ed8e6)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants