Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

cgroup: ignore cpuset if can not be applied. #447

Merged
merged 2 commits into from
Feb 18, 2019

Conversation

jcvenegas
Copy link
Member

The way that kata works is hotplugging cpus to the VM
based in cpu and qouta, cpuset request may not match
with the cpus requested from the host. Apply
only if possible.

Cpuset cgroup.

From Kernel documentation:

" Cpusets provide a mechanism for assigning a set of CPUs and Memory Nodes to
a set of tasks."

The Kata agent brings compatibility the cgroup cpuset CPU on the guest side.

The cpuset CPU cgroup will be applied on two events:

  • containers creation.

  • container update.

When the runtime request to apply cpuset cgroup to the agent, the amout of
vcpus available may be not the same to the required in the request.

This is becasue the request from the agent client (the Kata runtime) pass
cpusets that were requested to be placed on the host. This is to isolate the
container workload on some specific host CPUs. The the runtime pass the
requested cpuset to the agent, that will try to apply the cgroup cpuset on the
guest.

Because the runtime will only hot-plug and amount the CPUs calculated based in
the container period and quota, the VM wont have the same amount CPUs than the host.
guest.

Example:

docker run -ti --cpus 2 --cpuset 0,1 busybox

The expected result is that the container is limited to the time of 2 CPUs, but
is only allowed to be scheduled on CPUs 0 and 1

In a similar case and a valid traditional container is the following:

docker run -ti --cpus 2 --cpuset 2,3,4 busybox

Where the container is limited to 2 CPUs and can be scheduled on CPU 2,3,4.

In the case of the Kata. The Kata runtime will only hotplug 2 cpus, making
imposible to request to the guest kernel to schedule the workload on vcpu
3,4.

cpuset best effort application.

Kata will try agent will evaluate this request and see if is possible to
apply the cpuset request on the guest.

  • If the cpuset request includes CPUs that are not available in the guest it
    will ignore the request.
  • If the request has the CPUs requested in are available the request be
    applied in the guest.

Fixes: #446

Signed-off-by: Jose Carlos Venegas Munoz [email protected]

@jcvenegas jcvenegas requested review from devimc, sboeuf and gnawux January 28, 2019 22:24
cgroup.go Outdated
return "", err
}

cpusetGuestList, err := parsers.ParseUintList(strings.TrimSpace(string(cpusetGuest)))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpusetGuest is a string, why is string() needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I did not removed when factored out getCpusetGuest()

for k := range cpusetListReq {
if !cpusetGuestList[k] {
agentLog.Warnf("cpu %d is not in guest cpuset, using %s cpuset", k, cpusetGuest)
return cpusetGuest, nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getAvailableCpusetList returns all the cpus when a cpu in the list of cpuset is not available in the guest, for example if cpuset is 7, but the available cpus in the guest are 0-6, this function will return 0-6 even when only one cpu is requested. I wonder if this implementation is correct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it is difficult to decide if correct or not. Because it is traying to make more compatible a broken use case in Kata that is a current limitation because we run the workload in VM with a different set of CPUs

I prefer to not apply the cpuset at all instead of apply only where just a subset of them, why?

  • the workload is still limited to use limited cpu time based on cpu cgroup (cpu and qouta) so still be safe not consuming cpu time of other process.

for example if cpuset is 7, but the available cpus in the guest are 0-6, this function will return 0-6 even when only one cpu is requested

Lets translate to a docker example.

docker run -ti --runtime kata-runtime  --cpuset-cpus 7 --cpus 6  --net none --rm busybox
/ # cat /proc/self/status  | grep Cpus_allowed_list
Cpus_allowed_list:      0-5
  1. On the host the VM will run only on this cpu, so the workload as well with virtcontainers: reimplement sandbox cgroup runtime#1189
  2. On the guest workload, the cpuset will allow to run the workload in all the vcpus (think cpuset on the host is still isolating our workload to jump to other process.
  3. From user perspective on the host this is still correct because the VM is only using only the CPU 7

Another case:

docker run -tid --cpuset-cpu 1,5-9  --cpus 4 mycontainer
  1. The cpuset list is valid on the host and thanks to virtcontainers: reimplement sandbox cgroup runtime#1189 this is going to be limited on the host and the VM is allow to have its 4cpu time on any of the 1,5-9 host cpus (we isolate the workload)
  2. In the guest we will have 4 cpus, and cpuset list of 1,5-9, where only cpu 1 is available.

There was a few options.

  • Create a mapping for each cpu and vcpu, in this case is not even possible because host has at least 10 cpus and the guest hast 4.
  • Apply only the ones that are available, and ignore the rest, in this case the only cpu available is 1 so the workload would be able to run only on vcpu 1 but there rest of them wont be used by it, this mean waste of vcpus and the workload limited from the cpuset we partially apply to not get the cpu time of 4 cpus.
  • If the request has cpus that are not part of it, just ignore it. The cpuset is not applied on the guest and workload will be jumping around the guest vcpus but the host cpuset will prevent the VM going to not allowed host cpus, and the workload will be still limited to use 4 cpus because its cpu cgroup ( period and quota).

So I went for the last option where it if is possible the cpuset is applied. But if not there is still isolation at host level. The solution is not perfect but solve a very basic docker use case and works with cpu cgroup and cpuset host cgroup to work in a more compatible way.

docker run -tid --cpuset-cpu 2-4  --cpus 2 mycontainer

And should be the same for kubernetes when it isolate workloads to use a set of cpus. I still need to check this use case.

Copy link
Member

@egernst egernst Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cpuset itself needs to be translated before passing into the guest, imo. As it is provided, it is only valid for physical CPUs. Once cpuset is done on the host (kata-containers/runtime#1189), then this PR becomes relevant. IMO, after the runtime pins on the host, the cpuset information needs to be translated to map from physical CPUs to virtual CPUs in the guest. We shouldn't just pass the original request in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@egernst agree. I work on follow up PRs to do this, this is only allowing to run Kata on K8s static policy and docker.


This is becasue the request from the agent client (the Kata runtime) pass
cpusets that were requested to be placed on the host. This is to isolate the
container workload on some specific host CPUs. The the runtime pass the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The the runtime

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jcvenegas jcvenegas requested a review from a team as a code owner February 1, 2019 20:36
@jcvenegas jcvenegas force-pushed the cpuset-ignore branch 2 times, most recently from a70ea5b to 3ed31d7 Compare February 1, 2019 20:40
@jcvenegas
Copy link
Member Author

Updated PR to fix issue found testing commit with Kuberentes (multi containers in a pod). This adds fix the agent when try to update cgroup parents with with the request. The parents now are updated with the max set of cpus in the guest.

@grahamwhaley
Copy link
Contributor

Just going to try and tickle pullapprove on this, as we are testing it... and I need something not on the proxy repo. Going temporarily push a DNM label on it..

@jodh-intel
Copy link
Contributor

🐟

Copy link

@klynnrif klynnrif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested changes from a grammar/structure/flow review. Thanks!

documentation/features/cpuset.md Outdated Show resolved Hide resolved
documentation/features/cpuset.md Outdated Show resolved Hide resolved
documentation/features/cpuset.md Outdated Show resolved Hide resolved
documentation/features/cpuset.md Outdated Show resolved Hide resolved
documentation/features/cpuset.md Outdated Show resolved Hide resolved
documentation/features/cpuset.md Outdated Show resolved Hide resolved
documentation/features/cpuset.md Outdated Show resolved Hide resolved
documentation/features/cpuset.md Outdated Show resolved Hide resolved
documentation/features/cpuset.md Outdated Show resolved Hide resolved
documentation/features/cpuset.md Outdated Show resolved Hide resolved
@jcvenegas jcvenegas force-pushed the cpuset-ignore branch 3 times, most recently from 9560c23 to e321f77 Compare February 6, 2019 18:23
@jcvenegas
Copy link
Member Author

@klynnrif thank you! now looks better and easier to understand :)

@jcvenegas
Copy link
Member Author

/test

klynnrif
klynnrif previously approved these changes Feb 11, 2019
@jcvenegas jcvenegas force-pushed the cpuset-ignore branch 4 times, most recently from 049cff5 to fee39b1 Compare February 14, 2019 17:57
@jcvenegas
Copy link
Member Author

/test

@jcvenegas jcvenegas force-pushed the cpuset-ignore branch 2 times, most recently from a0d4fcc to 836c04c Compare February 14, 2019 22:02
@jcvenegas
Copy link
Member Author

/test

@ttx
Copy link
Member

ttx commented Feb 15, 2019

/zuul-recheck

@ttx ttx added wip and removed wip labels Feb 15, 2019
cgroup.go Outdated

for k := range cpusetListReq {
if !cpusetGuestList[k] {
agentLog.Warnf("cpuset '%s': cpu %d is not in guest cpu list, using guest cpus %s", cpusetReq, k, cpusetGuest)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithFields.. please 😄

cgroup.go Outdated
}

// All the cpus are valid keep the same cpuset string
agentLog.Debugf("the requested cpuset '%s' is valid using it", cpusetReq)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithFields

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, running again CI

@jcvenegas
Copy link
Member Author

/test

devimc
devimc previously approved these changes Feb 15, 2019
Copy link

@devimc devimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @jcvenegas

Copy link

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, but looks good!

Kata will try agent will evaluate this request and see if is possible to
apply the cpuset request on the guest.
The Kata agent evaluates the request to see if it is possible to apply the
cpuset request onto the guest.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you fix this directly on the first commit introducing this new documentation?

{"1", fakeGuestCpuset},
{"1,3", "2,13"},
{"1", "1"},
{"1,3", "1,3"},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you fix this directly on the first commit introducing those tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reorganized the commits

@@ -179,7 +179,7 @@ func updateContainerCpuset(cgroupPath string, newCpuset string, cookies cookie)
// Don't use c.container.Set because of it will modify container's config.
// c.container.Set MUST BE used only on update.
cpusetCpusPath := filepath.Join(cpusetPath, "cpuset.cpus")
agentLog.WithField("path", cpusetPath).Debug("updating cpuset cgroup")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you remove this log?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably miss to re-add when I move the code, I added back

The way that kata works is hotplugging cpus to the VM
based in cpu and qouta, cpuset request may not match
with the cpus requested from the host. Apply
only if possible.

Fixes: kata-containers#446

Signed-off-by: Jose Carlos Venegas Munoz <[email protected]>
When the agent is requested to update cpuset, it
also updates the cpuset for the parents with the cpuset
requested.

This is an issue when more than one container in the
pod are running. Because containers may share the cgroup
parent. To avoid those issues update the parents
with max cpus. And containers cpusets whith the value
requested.

Signed-off-by: Jose Carlos Venegas Munoz <[email protected]>
@jcvenegas
Copy link
Member Author

/test

Copy link

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jcvenegas, lgtm

@devimc devimc merged commit e8ca07f into kata-containers:master Feb 18, 2019
@egernst egernst mentioned this pull request Feb 26, 2019
@jcvenegas jcvenegas deleted the cpuset-ignore branch February 13, 2020 13:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants