-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve detection of CPU limits when running inside container #11933
Comments
One more note, that cpus can be passed in as fragments of cores:
In this case |
The 2 docker CLI options that affect this issue are:
All the runtime components depending on the number of processors available are:
All components but
dotnet/coreclr#12797 has already been done. It impacts all above runtime components (allowing to optimize performance in a container/machine with limited resources). This makes sure the runtime components makes the best use of available resources. In the case of I would argue that in the case of
The work has been done here for all runtime components except |
To complement the behavior of |
Just to clarify: does From reading the docs it seems it is just a CPU quota weighted in a total number of corers. Basically you may still utilize 4 threads, but will observe that each runs roughly 1/4 of the normal speed. Is that the right understanding? |
Exactly, |
Ah, ok, I didn't know that. I was assuming that it affinitizes the process to the minimum number of processors needed plus adds some throttling. Since it is not the case, it seems we should ignore the |
Then Even with reduced quota you may want to use threads, if that makes you more efficient or more responsive. On the other hand |
from GC's POV, if --cpus is specified to only use M cores out of N, we would want to create only M heaps though (not affinitized to any specific CPUs). |
--cpus does not need to be int and therefore has no direct implication for cores @Maoni0. I think the explanation with slowing to N core equivalent is good one. |
Running locally, I get the following:
To summarize, I'll do the following:
|
And all uses of @stephentoub what are your thoughts on adding a |
Having more information is generally better, but it is not easy to see how Perhaps SpinWait and similar could use that to reduce spinning vs. sleeping ... - since we get effectively a slower CPU ? |
cc: @jkotas & @tmds as participants of https://github.com/dotnet/corefx/issues/25193 |
@VSadov it could be used mostly to more accurately estimate the optimal number of threads, the same way it's used by the GC and the ThreadPool to better use available resources. |
Could you pick a few of the current uses of ProcessorCount and show how the quota would be used and what the benefit would be? |
There are actually two code paths in the SystemNative::GetProcessorCount. One is for the case when NUMA support is enabled in runtime (CPUGroupInfo::CanEnableThreadUseAllCpuGroups() returns TRUE) and one for the other case. In the case when NUMA is not enabled, we use the value returned from GetSystemInfo in the dwNumberOfProcessors. That value will also need to be changed to make it correctly influenced by the |
@janvorli I am updating This raise the question as to what should be returned in the case of NUMA being enabled and available, and for that I do not know, and I would love to understand better how NUMA fits in this project. |
As for NUMA, it seems we could prune the CPUs reported in the bitmasks by the libnuma by the thread affinity mask before we parse those bitmasks. I believe it would work fine. I say "I believe" since we cannot match it to Windows behavior as Windows don't seem to have a way to limit a process to run on a subset of CPUs only when the process uses the SetThreadIdealProcessorEx. |
I am not a container platform expert, but occasionally I do some things with OpenShift/Kubernetes, I have only seen settings that control the CPU share (like docker cpus). I haven't seen settings that restrict the number of CPU (cfr docker cpuset-cpus). Since there is no So for a container platform user, it may be preferable to stick to the current implementation. |
There are 2 Docker CLI command line options available that we are interested in here: - `--cpus`: this limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) - `--cpuset-cpus`: this limits the number of processors we have access to on the CPU; it also specifies which specific processor we have access to, but that’s irrelevant here All the runtime components depending on the number of processors available are: - ThreadPool - GC - `Environment.ProcessorCount` via `SystemNative::GetProcessorCount` - `SimpleRWLock::m_spinCount` - `BaseDomain::m_iNumberOfProcessors` (it's used to determine the GC heap to affinitize to) All components but `Environment.ProcessorCount` above are aware and take advantage of the values passed to `--cpus` and `--cpuset-cpus`. **`--cpus`** dotnet#12797 has already been done. It impacts all above runtime components (allowing to optimize performance in a container/machine with limited resources). This makes sure the runtime components makes the best use of available resources. In the case of `Environment.ProcessorCount`, the behavior is such that passing `--cpus=1.5` on a machine with 8 processors will return `1` as shown in https://github.com/dotnet/coreclr/issues/22302#issuecomment-459092299. This behavior is not consistent with [Windows Job Objects](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-jobobject_cpu_rate_control_information) which still returns the number of processors for the container/machine even if it only gets parts of the total number of cycles. This behavior is erroneous because the container still has access to the full range of processors on the machine, and only its _processor time_ is limited. For example, in the case of a 4 processors machine, with a value of `--cpus=1.8`, there can be 4 threads running in parallel even though each thread will only get `1.8 / 8 = .45` or 45% of all cycles of each processor. The work consist in reverting the behavior of `SystemNative::GetProcessorCount` to pre dotnet#12797. **`--cpuset-cpus`** The work has been done here for all runtime components except `Environment.ProcessorCount`. The work consist in fixing `PAL_GetLogicalCpuCountFromOS` to use `sched_getaffinity`. Fixes https://github.com/dotnet/coreclr/issues/22302
This focuse on better supporting `--cpuset-cpus` which limits the number of processors we have access to on the CPU; it also specifies which specific processor we have access to, but that’s irrelevant here The work has been done here for all runtime components except `Environment.ProcessorCount`. The work consist in fixing `PAL_GetLogicalCpuCountFromOS` to use `sched_getaffinity`. Fixes https://github.com/dotnet/coreclr/issues/22302
This focuses on better supporting Docker CLI's parameter `--cpus`, which limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) All the runtime components depending on the number of processors available are: - ThreadPool - GC - `Environment.ProcessorCount` via `SystemNative::GetProcessorCount` - `SimpleRWLock::m_spinCount` - `BaseDomain::m_iNumberOfProcessors` (it's used to determine the GC heap to affinitize to) All the above components take advantage of `--cpus` via `CGroup::GetCpuLimit` with dotnet#12797, allowing to optimize performance in a container/machine with limited resources. This makes sure the runtime components makes the best use of available resources. In the case of `Environment.ProcessorCount`, the behavior is such that passing `--cpus=1.5` on a machine with 8 processors will return `1` as shown in https://github.com/dotnet/coreclr/issues/22302#issuecomment-459092299. This behavior is not consistent with [Windows Job Objects](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-jobobject_cpu_rate_control_information) which still returns the number of processors for the container/machine even if it only gets parts of the total number of cycles. This behavior is erroneous because the container still has access to the full range of processors on the machine, and only its _processor time_ is limited. For example, in the case of a 4 processors machine, with a value of `--cpus=1.8`, there can be 4 threads running in parallel even though each thread will only get `1.8 / 8 = .45` or 45% of all cycles of each processor. The work consist in reverting the behavior of `SystemNative::GetProcessorCount` to pre dotnet#12797.
Does this change actually make things always better? We have said in the earlier discussion that the process quota really gives you slower processor, but it does not reduce the level of parallelism available... I think the underlying problem that all this is trying to solve is: What is the right size of pool or cache for a given component. The processor count works well for this on balanced machines that have reasonable amount of memory per processor. Docker makes it easy to create environments that are imbalanced: a lot of (potential) processor power and not enough memory; or vice versa. The simplistic algorithms to compute size of pool or cache work poorly in these conditions. Maybe what we need to do here is a API specifically designed for building pools and caches that auto-adjust their sizes bases on many factors. |
@jkotas it doesn't necessarily give you slower processors, but it reduces the budget you have on all the processors on the machine. I posted an example at dotnet/coreclr#23398 (comment), that illustrate some other cases like spiky workloads. A good example of a workload that benefits from this quota is a If we do not go with dotnet/coreclr#23398, we then have to figure out what is the best value to return in case that could be rounded up. For example, should |
Note that ThreadPool is not. One of the TP metrics is latency - how quickly it may dispatch bursts of tasks. Going to
|
For this discussion, I think it is also good to get full understanding on how the scheduling in the container works. There are couple of docs I've found: The kubernetes doc that @tmds has provided (https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/) says
The cgroups doc at https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-cpu#sect-cfs seems to be more specific on the scheduling behavior:
Even more detailed description of the CFS scheduling can be found here: |
I ran some ASP.NET Core benchmarks with
We can see a clear drop in performance between passing My hypothesis is that the difference between [1] the Max CPU (%) should be between 0-100%, but in 2. and 3. it's over 100%. That's because the total time spent on CPU is divided by |
@luhenry rounding up makes sense. If your container has 1.7 cpu, it will run on more than 1 core. Your benchmark results confirm this. |
This focuses on better supporting Docker CLI's parameter `--cpus`, which limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) All the runtime components depending on the number of processors available are: - ThreadPool - GC - `Environment.ProcessorCount` via `SystemNative::GetProcessorCount` - `SimpleRWLock::m_spinCount` - `BaseDomain::m_iNumberOfProcessors` (it's used to determine the GC heap to affinitize to) All the above components take advantage of `--cpus` via `CGroup::GetCpuLimit` with dotnet#12797, allowing to optimize performance in a container/machine with limited resources. This makes sure the runtime components makes the best use of available resources. In the case of `Environment.ProcessorCount`, the behavior is such that passing `--cpus=1.5` on a machine with 8 processors will return `1` as shown in https://github.com/dotnet/coreclr/issues/22302#issuecomment-459092299. This behavior is not consistent with [Windows Job Objects](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-jobobject_cpu_rate_control_information) which still returns the number of processors for the container/machine even if it only gets parts of the total number of cycles. This behavior is erroneous because the container still has access to the full range of processors on the machine, and only its _processor time_ is limited. For example, in the case of a 4 processors machine, with a value of `--cpus=1.8`, there can be 4 threads running in parallel even though each thread will only get `1.8 / 8 = .45` or 45% of all cycles of each processor. The work consist in reverting the behavior of `SystemNative::GetProcessorCount` to pre dotnet#12797.
This focuses on better supporting Docker CLI's parameter `--cpus`, which limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) All the runtime components depending on the number of processors available are: - ThreadPool - GC - `Environment.ProcessorCount` via `SystemNative::GetProcessorCount` - `SimpleRWLock::m_spinCount` - `BaseDomain::m_iNumberOfProcessors` (it's used to determine the GC heap to affinitize to) All the above components take advantage of `--cpus` via `CGroup::GetCpuLimit` with dotnet#12797, allowing to optimize performance in a container/machine with limited resources. This makes sure the runtime components makes the best use of available resources. In the case of `Environment.ProcessorCount`, the behavior is such that passing `--cpus=1.5` on a machine with 8 processors will return `1` as shown in https://github.com/dotnet/coreclr/issues/22302#issuecomment-459092299. This behavior is not consistent with [Windows Job Objects](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-jobobject_cpu_rate_control_information) which still returns the number of processors for the container/machine even if it only gets parts of the total number of cycles. This behavior is erroneous because the container still has access to the full range of processors on the machine, and only its _processor time_ is limited. For example, in the case of a 4 processors machine, with a value of `--cpus=1.8`, there can be 4 threads running in parallel even though each thread will only get `1.8 / 8 = .45` or 45% of all cycles of each processor. The work consist in reverting the behavior of `SystemNative::GetProcessorCount` to pre dotnet#12797.
dotnet/coreclr#23398 fixes the situation for the ThreadPool and the GC when passing I am now verifying the case of |
We can observe in the Based on that, I am proposing to go with rounding up in order to maximize the use of available CPU. |
This focuses on better supporting Docker CLI's parameter `--cpus`, which limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) All the runtime components depending on the number of processors available are: - ThreadPool - GC - `Environment.ProcessorCount` via `SystemNative::GetProcessorCount` - `SimpleRWLock::m_spinCount` - `BaseDomain::m_iNumberOfProcessors` (it's used to determine the GC heap to affinitize to) All the above components take advantage of `--cpus` via `CGroup::GetCpuLimit` with dotnet#12797, allowing to optimize performance in a container/machine with limited resources. This makes sure the runtime components makes the best use of available resources. In the case of `Environment.ProcessorCount`, the behavior is such that passing `--cpus=1.5` on a machine with 8 processors will return `1` as shown in https://github.com/dotnet/coreclr/issues/22302#issuecomment-459092299. This behavior is not consistent with [Windows Job Objects](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-jobobject_cpu_rate_control_information) which still returns the number of processors for the container/machine even if it only gets parts of the total number of cycles. This behavior is erroneous because the container still has access to the full range of processors on the machine, and only its _processor time_ is limited. For example, in the case of a 4 processors machine, with a value of `--cpus=1.8`, there can be 4 threads running in parallel even though each thread will only get `1.8 / 8 = .45` or 45% of all cycles of each processor. The work consist in reverting the behavior of `SystemNative::GetProcessorCount` to pre dotnet#12797.
In the past when I needed to measure changes in ThreadPool latency, I was pointed to the following snippet: Not sure if the benchmark exists in some other form.
I think that the initial worker limit should not change and be just the physical number of cores (i.e. the level of parallelizm available). Where CPU quota could be more interesting is in That is used to detect situations where CPU is lightly loaded and we are not making progress with tasks (which together indicates that workers are blocked), then we allow more workers as a progress guarantee/deadlock prevention measure. The part where we reason about "CPU is lightly loaded" could give biased reading when quotas are involved since it currently expects that all cores in affinity mask can do 100%. CC: @kouvel |
Note - there is also |
This focuses on better supporting Docker CLI's parameter `--cpus`, which limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) All the runtime components depending on the number of processors available are: - ThreadPool - GC - `Environment.ProcessorCount` via `SystemNative::GetProcessorCount` - `SimpleRWLock::m_spinCount` - `BaseDomain::m_iNumberOfProcessors` (it's used to determine the GC heap to affinitize to) All the above components take advantage of `--cpus` via `CGroup::GetCpuLimit` with dotnet#12797, allowing to optimize performance in a container/machine with limited resources. This makes sure the runtime components makes the best use of available resources. In the case of `Environment.ProcessorCount`, the behavior is such that passing `--cpus=1.5` on a machine with 8 processors will return `1` as shown in https://github.com/dotnet/coreclr/issues/22302#issuecomment-459092299. This behavior is not consistent with [Windows Job Objects](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-jobobject_cpu_rate_control_information) which still returns the number of processors for the container/machine even if it only gets parts of the total number of cycles. This behavior is erroneous because the container still has access to the full range of processors on the machine, and only its _processor time_ is limited. For example, in the case of a 4 processors machine, with a value of `--cpus=1.8`, there can be 4 threads running in parallel even though each thread will only get `1.8 / 8 = .45` or 45% of all cycles of each processor. The work consist in reverting the behavior of `SystemNative::GetProcessorCount` to pre dotnet#12797.
This focuses on better supporting Docker CLI's parameter `--cpus`, which limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) All the runtime components depending on the number of processors available are: - ThreadPool - GC - `Environment.ProcessorCount` via `SystemNative::GetProcessorCount` - `SimpleRWLock::m_spinCount` - `BaseDomain::m_iNumberOfProcessors` (it's used to determine the GC heap to affinitize to) All the above components take advantage of `--cpus` via `CGroup::GetCpuLimit` with dotnet#12797, allowing to optimize performance in a container/machine with limited resources. This makes sure the runtime components makes the best use of available resources. In the case of `Environment.ProcessorCount`, the behavior is such that passing `--cpus=1.5` on a machine with 8 processors will return `1` as shown in https://github.com/dotnet/coreclr/issues/22302#issuecomment-459092299. This behavior is not consistent with [Windows Job Objects](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-jobobject_cpu_rate_control_information) which still returns the number of processors for the container/machine even if it only gets parts of the total number of cycles. This behavior is erroneous because the container still has access to the full range of processors on the machine, and only its _processor time_ is limited. For example, in the case of a 4 processors machine, with a value of `--cpus=1.8`, there can be 4 threads running in parallel even though each thread will only get `1.8 / 8 = .45` or 45% of all cycles of each processor. The work consist in reverting the behavior of `SystemNative::GetProcessorCount` to pre dotnet#12797.
This focuses on better supporting Docker CLI's parameter `--cpus`, which limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) All the runtime components depending on the number of processors available are: - ThreadPool - GC - `Environment.ProcessorCount` via `SystemNative::GetProcessorCount` - `SimpleRWLock::m_spinCount` - `BaseDomain::m_iNumberOfProcessors` (it's used to determine the GC heap to affinitize to) All the above components take advantage of `--cpus` via `CGroup::GetCpuLimit` with dotnet#12797, allowing to optimize performance in a container/machine with limited resources. This makes sure the runtime components makes the best use of available resources. In the case of `Environment.ProcessorCount`, the behavior is such that passing `--cpus=1.5` on a machine with 8 processors will return `1` as shown in https://github.com/dotnet/coreclr/issues/22302#issuecomment-459092299. This behavior is not consistent with [Windows Job Objects](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-jobobject_cpu_rate_control_information) which still returns the number of processors for the container/machine even if it only gets parts of the total number of cycles. This behavior is erroneous because the container still has access to the full range of processors on the machine, and only its _processor time_ is limited. For example, in the case of a 4 processors machine, with a value of `--cpus=1.8`, there can be 4 threads running in parallel even though each thread will only get `1.8 / 8 = .45` or 45% of all cycles of each processor. The work consist in reverting the behavior of `SystemNative::GetProcessorCount` to pre dotnet#12797.
This focuses on better supporting Docker CLI's parameter `--cpus`, which limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) All the runtime components depending on the number of processors available are: - ThreadPool - GC - `Environment.ProcessorCount` via `SystemNative::GetProcessorCount` - `SimpleRWLock::m_spinCount` - `BaseDomain::m_iNumberOfProcessors` (it's used to determine the GC heap to affinitize to) All the above components take advantage of `--cpus` via `CGroup::GetCpuLimit` with dotnet#12797, allowing to optimize performance in a container/machine with limited resources. This makes sure the runtime components makes the best use of available resources. In the case of `Environment.ProcessorCount`, the behavior is such that passing `--cpus=1.5` on a machine with 8 processors will return `1` as shown in https://github.com/dotnet/coreclr/issues/22302#issuecomment-459092299. This behavior is not consistent with [Windows Job Objects](https://docs.microsoft.com/en-us/windows/desktop/api/winnt/ns-winnt-jobobject_cpu_rate_control_information) which still returns the number of processors for the container/machine even if it only gets parts of the total number of cycles. This behavior is erroneous because the container still has access to the full range of processors on the machine, and only its _processor time_ is limited. For example, in the case of a 4 processors machine, with a value of `--cpus=1.8`, there can be 4 threads running in parallel even though each thread will only get `1.8 / 8 = .45` or 45% of all cycles of each processor. The work consist in reverting the behavior of `SystemNative::GetProcessorCount` to pre dotnet#12797.
As @janvorli said:
Whatever you did is IMHO not sufficient. I could give you more detailed info if you'd like. First a description of our use case of ASP.NET Core APIs running in Kubernetes. tl;dr The way .NET currently works we are forced to run our docker containers without specifying any limits. We are running one of APIs in 14 instances (docker containers spread on our Kubernetes cluster - workers are CentOS based). Base image used is mcr.microsoft.com/dotnet/core/aspnet:2.2.6. When no CPU limits are specified it servers around 1k HTTP request per second consuming cca 350m of CPU per instance (kube pod). The total number of processors reported by .NET is 224 (16 per instance since each kubernetes worker is a VM with 16 logical CPUs). Setting kube CPU limit anywhere near our real usage causes .NET to report 14 CPUs (one per instance, ie. each dotnet process bahaves as if running on a single-core machine) and makes our system unable to handle its load. To cope we have to increase scaling about three times (to 42 instances) which obviously consumes way more resources than required (memory especially). Setting the limit higher than necessary does not protect us from overconsuming resources while limiting concurrency (whatever having e.g. I have also tried .NET Core 3 Preview 7 and it behaves the same on Kubernetes as 2.2.6 |
While the discussion was about two docker arguments, coreclr runtime reads the stuff from cgroups ( I would like to understand why setting the CPU limit near your real CPU usage causes the performance issues. I have a couple of questions:
|
No, it does not.
600m
We have to increase scaling about 3 times to handle the same load. I can't experiment with our Production environment as much as I would like :) I should probably add that the app is heavy on I/O, executing dozens of async network I/O operations per incoming request.
That's what I intend to test. To set the limits so that |
@janvorli I did not mean to suggest that this is implemented through some docker integration. I can see that it is done by means of kernel cgroups/cfs. I can also see from the discussion that it is fairly complex issue. Still, setting CPU quota is not intended to limit concurrency (at least not in kubernetes) and the way CoreCLR tunes itself is not appropriate in all use cases. |
Just to follow up on this if anyone would feel like addressing it. I feel like the current state of .NET is not suitable for Kubernetes deployments. Following is the result of changing kube pod CPU limit from unlimited (on 16 CPU kube workers) to Average CPU consumption is The way .NET works forces us to leave Kubernetes CPU limit at |
@mrmartan Could you please create a new github issue describing the problem? Comments on closed PRs are unlikely to get the response from the right people. |
@mrmartan - I have created an issue https://github.com/dotnet/coreclr/issues/26053 to my best understanding of your problem. |
It would be good to do some perf tracing to understand why containers with low cpu allocation are not performing as good as expected. Such low allocations are common on Kubernetes, so it may be worth doing continuous perf benchmarks. |
I now see here two issues. dotnet/coreclr#26053 @VSadov opened (I will try and add some details to it) and possibly the one @tmds is hinting at, i.e. is low performance expected with low CPU (single core; EDIT: or rather perceived by .NET as single core) allocation. I have a screenshot form a test I performed on our test Kubernetes cluster (mostly same configuration as discused above). In the test there was only one replica of the app in question. With kube CPU limit set to The image is from SuperBenchmarker (unfortunately I cropped it without the scale |
These numbers are hard to compare. |
I see your point. I no longer have the setup to test it again and post result here but at the time I could have given it 1000m and the result would be the same (while actual CPU consumption would not rise). The case is the application was not CPU limited. CoreCLR was crippled since it thought it was running on a single core machine. On K8s/Linux 700m CPU limit does not imply 1 CPU/thread but CoreCLR behaved as though. |
With dotnet/corefx#25193 in place Environment.ProcessorCount can now reflect limits imposed via docker. When running within
docker run --cpus=2 -ti microsoft/dotnet-buildtools-prereqs:rhel7_prereqs_2 /bin/bash
, this call returns 2 as expected on 8 core host.However when limits are enforced in different way, it fails to detect it. Let say I limit container to only first two cores:
This shows that container is limited to to cores.
But
Environment.ProcessorCount
returns 8.related to https://github.com/dotnet/corefx/issues/34920
Value obtained via sched_getaffinity() is also 2.
note that sched_getaffinity() does not work for first case when limits are enforced via limiting cycles.
Perhaps we need to do both and return lower value.
cc: @janvorli
The text was updated successfully, but these errors were encountered: