Consider not restricting level of parallelism in a container when user did not ask for it. #13224

VSadov · 2019-08-07T18:21:55Z

In a container scenario, depending on orchestration used, it is generally possible for a user to limit CPU quota and level of parallelism.
One does not imply another. That is why there are distinct knobs.

Example: on a machine with 4 cores, it is conceivable to have 8 containers limited to 1/8 of total CPU time for fairness/over-committing, but

each limited to using 1 CPU at a time. This would favor compute-intensive scenarios.
OR
each having access to all 4 CPUs. This would favor latency sensitive, IO-performing, bursty workloads.

It looks like coreclr may see no distinction between these cases and treat them both as #1.

There is a common pattern where once the quota falls below user-set Core count, we reduce the available Core count to match the quota.

User could ask for reduced parallelism, but did not.
Even if user might be doing this by accident, perhaps that should be respected? Otherwise, if there is a legit need for #2, it cannot be done.

==
https://github.com/dotnet/coreclr/blob/1d78ed88ff9078febdbb5c37173b313da5c023a3/src/utilcode/util.cpp#L1294

https://github.com/dotnet/coreclr/blob/31acad70cae93caf0afe753775f14ebe80db31ed/src/classlibnative/bcltype/system.cpp#L358

The text was updated successfully, but these errors were encountered:

mrmartan · 2019-08-08T10:33:10Z

We have a real world scenario where application (which would fall under #2 above) is performing desirably (consuming 300ms of CPU time every second and having a high IO workload). Setting CPU limits near that consumption (docker container on Linux using Kubernetes as orchestrator) would degrade performance to unusable levels. The issue would manifest itself whenever .NET reports Environment.ProcessorCount < 2. CPUs (cores) available to the application (its Linux process) would be 16 in all cases

janvorli · 2019-08-12T19:55:11Z

It seems we can:

Provide Environment.ProcessorQuota and change the Environment.ProcessorCount back to report the real number of available cores and make runtime use the quota wherever it makes sense. This would also provide app developers with a way to make behavior quota sensitive.
Or provide a config knob to enable / disable the quota influencing the reported CPU count.

VSadov · 2019-08-12T20:25:37Z

provide a config knob to enable / disable the quota influencing the reported CPU count.

I wonder if there are enough scenarios when this knob would be useful.

If limiting parallelism is the goal, it seems better to limit it on OS level - by using affinity settings or cpuset-cpus or cfs settings directly.

janvorli · 2019-08-12T21:04:43Z

If limiting parallelism is the goal, it seems better to limit it on OS level - by using affinity settings or cpuset-cpus or cfs settings directly.

So you are saying that we should revert the change we've made in the past that added CPU count clipping by the quota completely?

VSadov · 2019-08-12T21:52:08Z

Yes. We should undo CPU count clipping. - Always use the count of how many threads can be running at the same time.

For example reducing the number of heaps or even switching to workstation GC when OS may schedule threads on a lot more cores may lead to unnecessary contentions.
Limiting thread pool to low thread counts might lead to starvation.

It kind of makes sense to do that when you really can only be OS-scheduled on a reduced number of cores. Otherwise underestimating cores may get us into suboptimal behaviors.

VSadov · 2019-08-12T21:55:31Z

We should also consider CPU quota as a public API. But that is not a high-pri. It is only useful when there is a way to check the current CPU load and we do not have an API for that either.

Internally we need access to quota in a few scenarios where we compare current CPU load against maximum possible - for example in thread pool starvation detection. However, I think all cases are in coreclr, so that alone is not a reason for public API, other than "if we need it, perhaps someone else needs too", but that, again, equally holds for "current CPU load" API and we do not have that.

mrmartan · 2019-08-12T22:10:51Z

Limiting thread pool to low thread counts might lead to starvation.

For worker threads, this is configurable already through COMPlus_ThreadPool_ForceMinWorkerThreads. There is currently no way to configure IOCP thread pool size and its size is affected by CPU count clipping.

If limiting parallelism is the goal, it seems better to limit it on OS level - by using affinity settings or cpuset-cpus or cfs settings directly.

Please be aware that cpuset-cpus might be out of developer's control. For example, Kubernetes does not allow for setting that on purpose.

VSadov · 2019-08-12T22:32:50Z

@mrmartan - from what I read in https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/ , limiting CPU count is possible although is not considered common.

You would need to have Guaranteed QoS class with nonfractional CPU counts. Then you can have

.. static assignment increases CPU affinity and decreases context switches due to throttling for the CPU-bound workload ..

mrmartan · 2019-08-12T23:25:20Z

I already knew about that feature. Did not want to bring in here a beta feature that seems to deal more with CPU exclusivity. In such a case one can assume you have the entire CPU for yourself, quotas could be considered meaningless and I assume .NET would work well even now, wouldn't it? It also seemed to me, and I might be mistaken, that the feature is targeted more at single-threaded designs (like NodeJS or Redis) and NUMA use cases.

I guess saying that "Kubernetes does not allow for setting that on purpose" was misleading. Rather in most .NET use cases you would want the default behavior of sharing CPUs, especially in bursty workloads, and letting the orchestrator and OS do its thing.

jkotas · 2019-08-13T04:34:22Z

So you are saying that we should revert the change we've made in the past that added CPU count clipping by the quota completely?

+1. The CPU clipping change was justified by ASP.NET micro-benchmarks improvements in constrained containers, but it is not architecturally sound. We should find a better way to get the improvements we got from the original change.

check the current CPU load and we do not have an API for that either.

FWIW, we have a few APIs for that Process.TotalProcessorTime/UserProcessorTime and AppDomain.CurrentDomain.MonitoringTotalProcessorTime.

VSadov · 2019-08-13T05:15:59Z

FWIW, we have a few APIs for that Process.TotalProcessorTime/UserProcessorTime and AppDomain.CurrentDomain.MonitoringTotalProcessorTime

I stand corrected :-) Environment.ProcessorQuota is more useful then.
It is not as critical as exposing memory quotas, since you cannot be terminated for exceeding CPU quota, but useful.

The issue at hand may be of higher importance though. Perhaps even 3.0 or 3.1

mjsabby · 2019-08-13T09:32:04Z

+1, This would be a useful thing to fix. Putting everything bucket number 1 in this issue is causing problems for our scenario as well.

mrmartan · 2019-08-13T10:33:20Z

We should undo CPU count clipping.

Would this affect heaps/memory consumption in any way? In my case, not having CPU quota caused much higher memory consumption. Details here

kevingosse · 2019-08-14T06:45:16Z

We should undo CPU count clipping.

Would this affect heaps/memory consumption in any way? In my case, not having CPU quota caused much higher memory consumption. Details here

You should still be able to get those benefits by adjusting the GCHeapCount setting

kevingosse · 2019-08-14T06:50:22Z

It also seemed to me, and I might be mistaken, that the feature is targeted more at single-threaded designs (like NodeJS or Redis) and NUMA use cases.

Not only. We experimented for a while with quotas for latency-sensitive applications, with really bad results. Low-latency applications try to process most requests with a low response time, then if overloaded let a few of them timeout. If you're throttled, then for the remainder of this time share, all the requests are going to timeout, which impacts negatively the response time distribution. Having dedicated cores instead of a quota makes the application behavior more predictable.

mrmartan · 2019-08-20T22:44:22Z

You should still be able to get those benefits by adjusting the GCHeapCount setting

Yeah. Aren't we are losing the "works great out of the box" though? I don't think #2 is a particulary fringe use case and it would require quite some tikering with CoreCLR settings. Oh well, guess it's way better to be able to do it rather than not at all.

VSadov self-assigned this Aug 13, 2019

VSadov closed this as completed in dotnet/coreclr#26153 Aug 23, 2019

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the 5.0 milestone Jan 31, 2020

msallin mentioned this issue Aug 31, 2020

Question: Enviornment.ProcessorCount, Kubernetes/OpenShift and ThreadPool Min Threads #41586

Closed

zacharycmontoya mentioned this issue Dec 3, 2020

Add runtime metric for CPU usage in percent DataDog/dd-trace-dotnet#1064

Merged

ghost locked as resolved and limited conversation to collaborators Dec 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider not restricting level of parallelism in a container when user did not ask for it. #13224

Consider not restricting level of parallelism in a container when user did not ask for it. #13224

VSadov commented Aug 7, 2019

mrmartan commented Aug 8, 2019 •

edited

Loading

janvorli commented Aug 12, 2019

VSadov commented Aug 12, 2019

janvorli commented Aug 12, 2019

VSadov commented Aug 12, 2019

VSadov commented Aug 12, 2019 •

edited

Loading

mrmartan commented Aug 12, 2019 •

edited

Loading

VSadov commented Aug 12, 2019

mrmartan commented Aug 12, 2019 •

edited

Loading

jkotas commented Aug 13, 2019

VSadov commented Aug 13, 2019

mjsabby commented Aug 13, 2019

mrmartan commented Aug 13, 2019

kevingosse commented Aug 14, 2019

kevingosse commented Aug 14, 2019 •

edited

Loading

mrmartan commented Aug 20, 2019

Consider not restricting level of parallelism in a container when user did not ask for it. #13224

Consider not restricting level of parallelism in a container when user did not ask for it. #13224

Comments

VSadov commented Aug 7, 2019

mrmartan commented Aug 8, 2019 • edited Loading

janvorli commented Aug 12, 2019

VSadov commented Aug 12, 2019

janvorli commented Aug 12, 2019

VSadov commented Aug 12, 2019

VSadov commented Aug 12, 2019 • edited Loading

mrmartan commented Aug 12, 2019 • edited Loading

VSadov commented Aug 12, 2019

mrmartan commented Aug 12, 2019 • edited Loading

jkotas commented Aug 13, 2019

VSadov commented Aug 13, 2019

mjsabby commented Aug 13, 2019

mrmartan commented Aug 13, 2019

kevingosse commented Aug 14, 2019

kevingosse commented Aug 14, 2019 • edited Loading

mrmartan commented Aug 20, 2019

mrmartan commented Aug 8, 2019 •

edited

Loading

VSadov commented Aug 12, 2019 •

edited

Loading

mrmartan commented Aug 12, 2019 •

edited

Loading

mrmartan commented Aug 12, 2019 •

edited

Loading

kevingosse commented Aug 14, 2019 •

edited

Loading