Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider not restricting level of parallelism in a container when user did not ask for it. #13224

Closed
VSadov opened this issue Aug 7, 2019 · 16 comments · Fixed by dotnet/coreclr#26153
Assignees
Milestone

Comments

@VSadov
Copy link
Member

VSadov commented Aug 7, 2019

In a container scenario, depending on orchestration used, it is generally possible for a user to limit CPU quota and level of parallelism.
One does not imply another. That is why there are distinct knobs.

Example: on a machine with 4 cores, it is conceivable to have 8 containers limited to 1/8 of total CPU time for fairness/over-committing, but

  1. each limited to using 1 CPU at a time. This would favor compute-intensive scenarios.
    OR
  2. each having access to all 4 CPUs. This would favor latency sensitive, IO-performing, bursty workloads.

It looks like coreclr may see no distinction between these cases and treat them both as #1.

There is a common pattern where once the quota falls below user-set Core count, we reduce the available Core count to match the quota.

User could ask for reduced parallelism, but did not.
Even if user might be doing this by accident, perhaps that should be respected? Otherwise, if there is a legit need for #2, it cannot be done.

==
https://github.com/dotnet/coreclr/blob/1d78ed88ff9078febdbb5c37173b313da5c023a3/src/utilcode/util.cpp#L1294

https://github.com/dotnet/coreclr/blob/31acad70cae93caf0afe753775f14ebe80db31ed/src/classlibnative/bcltype/system.cpp#L358

https://github.com/dotnet/coreclr/blob/31acad70cae93caf0afe753775f14ebe80db31ed/src/classlibnative/bcltype/system.cpp#L358

@mrmartan
Copy link

mrmartan commented Aug 8, 2019

We have a real world scenario where application (which would fall under #2 above) is performing desirably (consuming 300ms of CPU time every second and having a high IO workload). Setting CPU limits near that consumption (docker container on Linux using Kubernetes as orchestrator) would degrade performance to unusable levels. The issue would manifest itself whenever .NET reports Environment.ProcessorCount < 2. CPUs (cores) available to the application (its Linux process) would be 16 in all cases

@janvorli
Copy link
Member

It seems we can:

  • Provide Environment.ProcessorQuota and change the Environment.ProcessorCount back to report the real number of available cores and make runtime use the quota wherever it makes sense. This would also provide app developers with a way to make behavior quota sensitive.
  • Or provide a config knob to enable / disable the quota influencing the reported CPU count.

@VSadov
Copy link
Member Author

VSadov commented Aug 12, 2019

provide a config knob to enable / disable the quota influencing the reported CPU count.

I wonder if there are enough scenarios when this knob would be useful.

If limiting parallelism is the goal, it seems better to limit it on OS level - by using affinity settings or cpuset-cpus or cfs settings directly.

@janvorli
Copy link
Member

If limiting parallelism is the goal, it seems better to limit it on OS level - by using affinity settings or cpuset-cpus or cfs settings directly.

So you are saying that we should revert the change we've made in the past that added CPU count clipping by the quota completely?

@VSadov
Copy link
Member Author

VSadov commented Aug 12, 2019

Yes. We should undo CPU count clipping. - Always use the count of how many threads can be running at the same time.

For example reducing the number of heaps or even switching to workstation GC when OS may schedule threads on a lot more cores may lead to unnecessary contentions.
Limiting thread pool to low thread counts might lead to starvation.

It kind of makes sense to do that when you really can only be OS-scheduled on a reduced number of cores. Otherwise underestimating cores may get us into suboptimal behaviors.

@VSadov
Copy link
Member Author

VSadov commented Aug 12, 2019

We should also consider CPU quota as a public API. But that is not a high-pri. It is only useful when there is a way to check the current CPU load and we do not have an API for that either.

Internally we need access to quota in a few scenarios where we compare current CPU load against maximum possible - for example in thread pool starvation detection. However, I think all cases are in coreclr, so that alone is not a reason for public API, other than "if we need it, perhaps someone else needs too", but that, again, equally holds for "current CPU load" API and we do not have that.

@mrmartan
Copy link

mrmartan commented Aug 12, 2019

Limiting thread pool to low thread counts might lead to starvation.

For worker threads, this is configurable already through COMPlus_ThreadPool_ForceMinWorkerThreads. There is currently no way to configure IOCP thread pool size and its size is affected by CPU count clipping.

If limiting parallelism is the goal, it seems better to limit it on OS level - by using affinity settings or cpuset-cpus or cfs settings directly.

Please be aware that cpuset-cpus might be out of developer's control. For example, Kubernetes does not allow for setting that on purpose.

@VSadov
Copy link
Member Author

VSadov commented Aug 12, 2019

@mrmartan - from what I read in https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/ , limiting CPU count is possible although is not considered common.

You would need to have Guaranteed QoS class with nonfractional CPU counts. Then you can have

.. static assignment increases CPU affinity and decreases context switches due to throttling for the CPU-bound workload ..

@mrmartan
Copy link

mrmartan commented Aug 12, 2019

I already knew about that feature. Did not want to bring in here a beta feature that seems to deal more with CPU exclusivity. In such a case one can assume you have the entire CPU for yourself, quotas could be considered meaningless and I assume .NET would work well even now, wouldn't it? It also seemed to me, and I might be mistaken, that the feature is targeted more at single-threaded designs (like NodeJS or Redis) and NUMA use cases.

I guess saying that "Kubernetes does not allow for setting that on purpose" was misleading. Rather in most .NET use cases you would want the default behavior of sharing CPUs, especially in bursty workloads, and letting the orchestrator and OS do its thing.

@jkotas
Copy link
Member

jkotas commented Aug 13, 2019

So you are saying that we should revert the change we've made in the past that added CPU count clipping by the quota completely?

+1. The CPU clipping change was justified by ASP.NET micro-benchmarks improvements in constrained containers, but it is not architecturally sound. We should find a better way to get the improvements we got from the original change.

check the current CPU load and we do not have an API for that either.

FWIW, we have a few APIs for that Process.TotalProcessorTime/UserProcessorTime and AppDomain.CurrentDomain.MonitoringTotalProcessorTime.

@VSadov
Copy link
Member Author

VSadov commented Aug 13, 2019

FWIW, we have a few APIs for that Process.TotalProcessorTime/UserProcessorTime and AppDomain.CurrentDomain.MonitoringTotalProcessorTime

I stand corrected :-) Environment.ProcessorQuota is more useful then.
It is not as critical as exposing memory quotas, since you cannot be terminated for exceeding CPU quota, but useful.

The issue at hand may be of higher importance though. Perhaps even 3.0 or 3.1

@mjsabby
Copy link
Contributor

mjsabby commented Aug 13, 2019

+1, This would be a useful thing to fix. Putting everything bucket number 1 in this issue is causing problems for our scenario as well.

@mrmartan
Copy link

We should undo CPU count clipping.

Would this affect heaps/memory consumption in any way? In my case, not having CPU quota caused much higher memory consumption. Details here

@VSadov VSadov self-assigned this Aug 13, 2019
@kevingosse
Copy link
Contributor

We should undo CPU count clipping.

Would this affect heaps/memory consumption in any way? In my case, not having CPU quota caused much higher memory consumption. Details here

You should still be able to get those benefits by adjusting the GCHeapCount setting

@kevingosse
Copy link
Contributor

kevingosse commented Aug 14, 2019

It also seemed to me, and I might be mistaken, that the feature is targeted more at single-threaded designs (like NodeJS or Redis) and NUMA use cases.

Not only. We experimented for a while with quotas for latency-sensitive applications, with really bad results. Low-latency applications try to process most requests with a low response time, then if overloaded let a few of them timeout. If you're throttled, then for the remainder of this time share, all the requests are going to timeout, which impacts negatively the response time distribution. Having dedicated cores instead of a quota makes the application behavior more predictable.

@mrmartan
Copy link

You should still be able to get those benefits by adjusting the GCHeapCount setting

Yeah. Aren't we are losing the "works great out of the box" though? I don't think #2 is a particulary fringe use case and it would require quite some tikering with CoreCLR settings. Oh well, guess it's way better to be able to do it rather than not at all.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants