-
Notifications
You must be signed in to change notification settings - Fork 375
Add cgroup support #734
Add cgroup support #734
Conversation
959b4c8
to
bb94d65
Compare
Replace #416 |
This comment has been minimized.
This comment has been minimized.
Note 1:Currently I imported "github.com/WeiZhang555/cgroups", this repo is actually a fork of master branch of
Combining the issues above, I will suggest: Note 2:
|
ef0d364
to
b121ac4
Compare
This comment has been minimized.
This comment has been minimized.
Because there's cgroup support in kata-agent, so we need to do some trick to see if this really works. test steps with docker:
Top on host, you can see the qemu process takes about 60% cpu usage. then enter test steps with k8s + cri-containerdUse this pod spec:
There will be 2 containers inside the POD, issue foundWhen tested with cri-containerd with above pod spec, the resource limit in cgroup is set to 0.7 core as expected, but the VM got 3 vcpu, that's not right. Guess that's because we hotplugged two vcpus for two containers, plus default one vcpu. We should do more accurate calculation for VCPU numbers. |
2a6fede
to
a231690
Compare
PSS Measurement: Memory inside container: |
a231690
to
90aca15
Compare
PSS Measurement: Memory inside container: |
90aca15
to
9ba8baf
Compare
PSS Measurement: Memory inside container: |
Codecov Report
@@ Coverage Diff @@
## master #734 +/- ##
==========================================
- Coverage 66.09% 65.75% -0.35%
==========================================
Files 87 88 +1
Lines 10705 10685 -20
==========================================
- Hits 7076 7026 -50
- Misses 2897 2919 +22
- Partials 732 740 +8 |
A huge PR but it is really something we need. I'll take a closer look later today. Thanks @WeiZhang555 ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good. A few comments inline.
@@ -31,7 +31,9 @@ import ( | |||
|
|||
// vmStartTimeout represents the time in seconds a sandbox can wait before | |||
// to consider the VM starting operation failed. | |||
const vmStartTimeout = 10 | |||
const ( | |||
vmStartTimeout = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move the comments as well if the braces are intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it!
@@ -122,6 +122,11 @@ func createSandboxFromConfig(ctx context.Context, sandboxConfig SandboxConfig, f | |||
return nil, err | |||
} | |||
|
|||
// Setup host cgroups | |||
if err := s.setupCgroups(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already called in createContainers()
. Are you trying to make sure there is host cpu cg even for an empty sandbox?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this s.setupCgroups()
in createSandboxFromConfig
is for the initial sandbox container, in this sandbox it has exactly one container. Another call of s.setupCgroups()
is in s.CreateContainer()
. The setup process only happens once for each container(including sandbox container), there's no duplicate call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WeiZhang555 There is a code path createSandboxFromConfig
-> s.createContainers
-> createContainer
-> setupCgroups
for the initial container in the sandbox as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createContainer
-> setupCgroups
doesn't exist.
Maybe you mixed createContainer()
from virtcontainers/container.go
with the one from cli/create.go
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, you are right! I mixed the two createcontainer() functions in sandbox.go and container.go, though they have a initial capital difference. Sry for the noise...
|
||
// TODO: how to handle empty/unlimited resource? | ||
// maybe we should add a default CPU/Memory delta when no | ||
// resource limit is given. -- @WeiZhang555 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default cpu quota might conflict globally on the host if we exceed the total available CPU time. While users might set it to exceed, we should not make it happen unintentionally. docker/runc does not set default quota for containers either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The worry is, if there's two containers inside a POD, one with quota 6000, the other one with quota -1 (unlimited), total quota will be 6000.
That means with my calculation, one container with 0.6 core + one container with unlimited cores = 0.6 core.
Not sure if this can satisfy everyone, I can only suggest user, if you set a limit for one container, you should set limit for every container too...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please file an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
virtcontainers/cgroups.go
Outdated
// so use previous period 10000 as a baseline, container B | ||
// has proportional resource of quota 4000 and period 10000, calculated as | ||
// delta := 40 / 100 * 10000 = 4000 | ||
// and final `*resource.CPU.Quota` = 5000 + 4000 = 9000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I like the proportional calculation and the comments!
@WeiZhang555 Nice patch, thanks! w.r.t. the containerd bug, is it possible to push containerd/cgroups#54 forward? Also I think we should update our spec version even though we do not support rdma right now. Then we can import from containerd directly ;) |
I think this is achivable, just can't be sure how long it takes.
I need to check if bumping runtime-spec breaks anything. Currently kata-runtime and kata-agent are using latest stable release v1.0.1, I think we have more reasons to keep it v1.0.1 as it's a stable release, you know, comparing to an in-developing master branch. |
We bumped the spec in containerd, you shouldn't have any backward incompat issues |
Hi @crosbymichael , thank you for your response! Then I think bumping kata-runtime's runtime spec version is the right way, kata-runtime/agent should use same/similiar version of runtime spec with containerd 😄 |
@WeiZhang555 ping from your weekly Kata herder. |
a280af6
to
1abfb00
Compare
Rebased. ping @devimc , what do you think of this #734 (comment) ? |
/test |
// immediately as default behaviour. | ||
if len(tids.vcpus) > 0 { | ||
if err := s.cgroup.sandboxSub.Add(cgroups.Process{ | ||
Pid: tids.vcpus[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So are vcpus[] thread ids or cpu ids? I'm a little confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are thread ids
Add new vendor library "github.com/containerd/cgroups" commit: 5017d4e9a9cf2d4381db99eacd9baf84b95bfb14 This library is for host cgroup support for next commit. Signed-off-by: Wei Zhang <[email protected]>
Fixes kata-containers#344 Add host cgroup support for kata. This commits only adds cpu.cfs_period and cpu.cfs_quota support. It will create 3-level hierarchy, take "cpu" cgroup as an example: ``` /sys/fs/cgroup |---cpu |---kata |---<sandbox-id> |--vcpu |---<sandbox-id> ``` * `vc` cgroup is common parent for all kata-container sandbox, it won't be removed after sandbox removed. This cgroup has no limitation. * `<sandbox-id>` cgroup is the layer for each sandbox, it contains all other qemu threads except for vcpu threads. In future, we can consider putting all shim processes and proxy process here. This cgroup has no limitation yet. * `vcpu` cgroup contains vcpu threads from qemu. Currently cpu quota and period constraint applies to this cgroup. Signed-off-by: Wei Zhang <[email protected]> Signed-off-by: Jingxiao Lu <[email protected]>
1abfb00
to
34fe3b9
Compare
/test |
CI passed. Merging this as we have enough LGTMs and it has been pending for long time. |
Yeah! Congrats |
@liangxianlong I'm not sure about your meaning by "reuse" my code. If you want to reuse kata-runtime codes to implement new feature, please go ahead and do it! This is an open source project and you're welcome to use and contribute! |
@WeiZhang555 I test your code, i run this command "docker run -ti --cpuset-cpus 1 busybox /bin/sh",and the i see the directory on my host: "/sys/fs/cgroup/cpuset/kata/e76fe6c34d9e75333b06091e0a68095a470ba4333c3de4440e99010029f1674a/vcpu". But if we have two vcpus, i think the directory should like this "/sys/fs/cgroup/cpuset/kata/e76fe6c34d9e75333b06091e0a68095a470ba4333c3de4440e99010029f1674a/vcpu0" and "/sys/fs/cgroup/cpuset/kata/e76fe6c34d9e75333b06091e0a68095a470ba4333c3de4440e99010029f1674a/vcpu1" |
@liangxianlong So you are trying to support cpuset, currently only cfs_quota and cfs_period are supported. cpuset support could be more complicated, it depends on our cgroup setting policy.
|
@WeiZhang555 Now,i don't care container. In my test, after "docker run -ti --cpuset-cpus 1 busybox /bin/sh ", there will be two results, (1)the container's process in vm is bound to vcpu1, (2) build a diretory on host "/sys/fs/cgroup/cpuset/kata/${sandbox-id}/vcpu". The vcpux is just a qemu thread,so if we want to realize this: Bind the vcpu to the physical cpu;Does the code need some modifications? |
@liangxianlong This is achievable, you can enhance the code to add cpuset support, it should be easy. |
thanks. Another question, if i run "docker run -ti busybox /bin/sh ", two directory will be built on my host, |
@liangxianlong That's because By the way, it's better to open another issue for discussing and tracking this, discussing under a closed PR may be ignored by other people, this is not right place 😄 |
@WeiZhang555 think s! I'm new to kata and interested in it,so please bear with me. xixi. |
@WeiZhang555 Regarding this PR, I asked a question, please take a look, thank you. |
protocols: client: Add timeout for hybrid vsock handshake
cgroups: add host cgroup support
Fixes #344
Add host cgroup support for kata.
This commits only adds cpu.cfs_period and cpu.cfs_quota support.
It will create 3-level hierarchy, take "cpu" cgroup as an example:
vc
cgroup is common parent for all kata-container sandbox, it won't be removedafter sandbox removed. This cgroup has no limitation.
<sandbox-id>
cgroup is the layer for each sandbox, it contains all other qemuthreads except for vcpu threads. In future, we can consider putting all shim
processes and proxy process here. This cgroup has no limitation yet.
vcpu
cgroup contains vcpu threads from qemu. Currently cpu quota and periodconstraint applies to this cgroup.
Signed-off-by: Wei Zhang [email protected]
Signed-off-by: Jingxiao Lu [email protected]