-
Notifications
You must be signed in to change notification settings - Fork 112
cgroup: ignore cpuset if can not be applied. #447
Conversation
cgroup.go
Outdated
return "", err | ||
} | ||
|
||
cpusetGuestList, err := parsers.ParseUintList(strings.TrimSpace(string(cpusetGuest))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpusetGuest is a string, why is string()
needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I did not removed when factored out getCpusetGuest()
for k := range cpusetListReq { | ||
if !cpusetGuestList[k] { | ||
agentLog.Warnf("cpu %d is not in guest cpuset, using %s cpuset", k, cpusetGuest) | ||
return cpusetGuest, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getAvailableCpusetList
returns all the cpus when a cpu in the list of cpuset is not available in the guest, for example if cpuset is 7, but the available cpus in the guest are 0-6, this function will return 0-6 even when only one cpu is requested. I wonder if this implementation is correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it is difficult to decide if correct or not. Because it is traying to make more compatible a broken use case in Kata that is a current limitation because we run the workload in VM with a different set of CPUs
I prefer to not apply the cpuset at all instead of apply only where just a subset of them, why?
- the workload is still limited to use limited cpu time based on cpu cgroup (cpu and qouta) so still be safe not consuming cpu time of other process.
for example if cpuset is 7, but the available cpus in the guest are 0-6, this function will return 0-6 even when only one cpu is requested
Lets translate to a docker example.
docker run -ti --runtime kata-runtime --cpuset-cpus 7 --cpus 6 --net none --rm busybox
/ # cat /proc/self/status | grep Cpus_allowed_list
Cpus_allowed_list: 0-5
- On the host the VM will run only on this cpu, so the workload as well with virtcontainers: reimplement sandbox cgroup runtime#1189
- On the guest workload, the cpuset will allow to run the workload in all the vcpus (think cpuset on the host is still isolating our workload to jump to other process.
- From user perspective on the host this is still correct because the VM is only using only the CPU 7
Another case:
docker run -tid --cpuset-cpu 1,5-9 --cpus 4 mycontainer
- The cpuset list is valid on the host and thanks to virtcontainers: reimplement sandbox cgroup runtime#1189 this is going to be limited on the host and the VM is allow to have its 4cpu time on any of the 1,5-9 host cpus (we isolate the workload)
- In the guest we will have 4 cpus, and cpuset list of 1,5-9, where only cpu 1 is available.
There was a few options.
- Create a mapping for each cpu and vcpu, in this case is not even possible because host has at least 10 cpus and the guest hast 4.
- Apply only the ones that are available, and ignore the rest, in this case the only cpu available is
1
so the workload would be able to run only on vcpu 1 but there rest of them wont be used by it, this mean waste of vcpus and the workload limited from the cpuset we partially apply to not get the cpu time of 4 cpus. - If the request has cpus that are not part of it, just ignore it. The cpuset is not applied on the guest and workload will be jumping around the guest vcpus but the host cpuset will prevent the VM going to not allowed host cpus, and the workload will be still limited to use 4 cpus because its cpu cgroup ( period and quota).
So I went for the last option where it if is possible the cpuset is applied. But if not there is still isolation at host level. The solution is not perfect but solve a very basic docker use case and works with cpu cgroup and cpuset host cgroup to work in a more compatible way.
docker run -tid --cpuset-cpu 2-4 --cpus 2 mycontainer
And should be the same for kubernetes when it isolate workloads to use a set of cpus. I still need to check this use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cpuset itself needs to be translated before passing into the guest, imo. As it is provided, it is only valid for physical CPUs. Once cpuset is done on the host (kata-containers/runtime#1189), then this PR becomes relevant. IMO, after the runtime pins on the host, the cpuset information needs to be translated to map from physical CPUs to virtual CPUs in the guest. We shouldn't just pass the original request in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@egernst agree. I work on follow up PRs to do this, this is only allowing to run Kata on K8s static policy and docker.
documentation/features/cpuset.md
Outdated
|
||
This is becasue the request from the agent client (the Kata runtime) pass | ||
cpusets that were requested to be placed on the host. This is to isolate the | ||
container workload on some specific host CPUs. The the runtime pass the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The the runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
27a6ec2
to
4d1f54e
Compare
a70ea5b
to
3ed31d7
Compare
Updated PR to fix issue found testing commit with Kuberentes (multi containers in a pod). This adds fix the agent when try to update cgroup parents with with the request. The parents now are updated with the max set of cpus in the guest. |
Just going to try and tickle pullapprove on this, as we are testing it... and I need something not on the proxy repo. Going temporarily push a DNM label on it.. |
🐟 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested changes from a grammar/structure/flow review. Thanks!
9560c23
to
e321f77
Compare
@klynnrif thank you! now looks better and easier to understand :) |
/test |
049cff5
to
fee39b1
Compare
/test |
a0d4fcc
to
836c04c
Compare
/test |
/zuul-recheck |
cgroup.go
Outdated
|
||
for k := range cpusetListReq { | ||
if !cpusetGuestList[k] { | ||
agentLog.Warnf("cpuset '%s': cpu %d is not in guest cpu list, using guest cpus %s", cpusetReq, k, cpusetGuest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WithFields
.. please 😄
cgroup.go
Outdated
} | ||
|
||
// All the cpus are valid keep the same cpuset string | ||
agentLog.Debugf("the requested cpuset '%s' is valid using it", cpusetReq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WithFields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, running again CI
836c04c
to
31a745a
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @jcvenegas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, but looks good!
Kata will try agent will evaluate this request and see if is possible to | ||
apply the cpuset request on the guest. | ||
The Kata agent evaluates the request to see if it is possible to apply the | ||
cpuset request onto the guest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you fix this directly on the first commit introducing this new documentation?
{"1", fakeGuestCpuset}, | ||
{"1,3", "2,13"}, | ||
{"1", "1"}, | ||
{"1,3", "1,3"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you fix this directly on the first commit introducing those tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reorganized the commits
@@ -179,7 +179,7 @@ func updateContainerCpuset(cgroupPath string, newCpuset string, cookies cookie) | |||
// Don't use c.container.Set because of it will modify container's config. | |||
// c.container.Set MUST BE used only on update. | |||
cpusetCpusPath := filepath.Join(cpusetPath, "cpuset.cpus") | |||
agentLog.WithField("path", cpusetPath).Debug("updating cpuset cgroup") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you remove this log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably miss to re-add when I move the code, I added back
The way that kata works is hotplugging cpus to the VM based in cpu and qouta, cpuset request may not match with the cpus requested from the host. Apply only if possible. Fixes: kata-containers#446 Signed-off-by: Jose Carlos Venegas Munoz <[email protected]>
When the agent is requested to update cpuset, it also updates the cpuset for the parents with the cpuset requested. This is an issue when more than one container in the pod are running. Because containers may share the cgroup parent. To avoid those issues update the parents with max cpus. And containers cpusets whith the value requested. Signed-off-by: Jose Carlos Venegas Munoz <[email protected]>
31a745a
to
67b2559
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jcvenegas, lgtm
The way that kata works is hotplugging cpus to the VM
based in cpu and qouta, cpuset request may not match
with the cpus requested from the host. Apply
only if possible.
Cpuset cgroup.
From Kernel documentation:
" Cpusets provide a mechanism for assigning a set of CPUs and Memory Nodes to
a set of tasks."
The Kata agent brings compatibility the cgroup cpuset CPU on the guest side.
The cpuset CPU cgroup will be applied on two events:
containers creation.
container update.
When the runtime request to apply cpuset cgroup to the agent, the amout of
vcpus available may be not the same to the required in the request.
This is becasue the request from the agent client (the Kata runtime) pass
cpusets that were requested to be placed on the host. This is to isolate the
container workload on some specific host CPUs. The the runtime pass the
requested cpuset to the agent, that will try to apply the cgroup cpuset on the
guest.
Because the runtime will only hot-plug and amount the CPUs calculated based in
the container period and quota, the VM wont have the same amount CPUs than the host.
guest.
Example:
docker run -ti --cpus 2 --cpuset 0,1 busybox
The expected result is that the container is limited to the time of 2 CPUs, but
is only allowed to be scheduled on CPUs 0 and 1
In a similar case and a valid traditional container is the following:
docker run -ti --cpus 2 --cpuset 2,3,4 busybox
Where the container is limited to 2 CPUs and can be scheduled on CPU 2,3,4.
In the case of the Kata. The Kata runtime will only hotplug 2 cpus, making
imposible to request to the guest kernel to schedule the workload on vcpu
3,4.
cpuset best effort application.
Kata will try agent will evaluate this request and see if is possible to
apply the cpuset request on the guest.
will ignore the request.
applied in the guest.
Fixes: #446
Signed-off-by: Jose Carlos Venegas Munoz [email protected]