-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scx_rustland: improve audio workload and performance predictability #307
Conversation
Do not always assign the maximum time slice to interactive tasks, but use the same value of the dynamic time slice for everyone. This seems to prevent potential audio cracking when the system is over commissioned. Signed-off-by: Andrea Righi <[email protected]>
The shared DSQ is typically used to prioritize tasks and dispatch them on the first CPU available, so consume from the shared DSQ before the local CPU DSQ. Signed-off-by: Andrea Righi <[email protected]>
Dispatch non-interactive tasks on the CPU selected by the built-in idle selection logic and allow interactive tasks to be dispatched on any CPU. Signed-off-by: Andrea Righi <[email protected]>
* advantage of the cached working set. | ||
*/ | ||
if (scx_bpf_test_and_clear_cpu_idle(prev_cpu)) { | ||
tctx->dispatch_local = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, you can now directly call scx_bpf_dispatch()
from select_cpu()
, so you don' t have to set dispatch_local
to tell enqueue()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can definitely keep the flag to track the direct dispatches, and then dispatch directly from select_cpu()
... I guess my brain is just too used to the "old" dispatch_local way :)
* If the task was directly dispatched give it a second chance to | ||
* remain on the current CPU, instead of immediately migrating it. | ||
*/ | ||
if (tctx->dispatch_local) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then maybe this can be renamed to clarify that it's tracking what happened the last time? Not a big difference but might make it a bit easier to follow.
Overall, yeah, other schedulers too are a lot more eager to move tasks across CPU boundaries than the kernel's. This has some benefits in terms of work conservation but at the expense of L1/2 locality. This probably is an area other schedulers can improve on too - e.g. consider overall saturation level of the system, how long the prev CPU has been busy since the waking task ran on it the last time (as a proxy for measuring cache decay), and how soon the CPU is expected to be freed up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about renaming the variable to clarify that it's basically used to keep track of the previous direct dispatch. Will change it and clarify also the comment.
About being less aggressive with migrations, it'd be nice also to add some topology awareness to the equation, like try to stick not just on prev_cpu
, but on a subset of CPUs that share the L2 cache with prev_cpu
, for example. Maybe that's something that we could even add to the in-kernel idle selection logic to improve all schedulers in general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, teaching some topology awareness to select_cpu_dfl()
is something on the todo list.
Implement a second-chance migration in select_cpu(): after a task has been dispatched directly do not try to migrate it immediately on a different CPU, but force it to stay on prev_cpu for another round. This seems quite effective on certain architectures (such as on a system with 11th Gen Intel(R) Core(TM) i7-1195G7 @ 2.90GHz), and it can provide noticeable benefits with gaming or WebGL applications (such as https://webglsamples.org/aquarium/aquarium.html) under regular workload conditions (around +5% fps). Signed-off-by: Andrea Righi <[email protected]>
03ec0f7
to
4791d86
Compare
Alright, renamed the flag |
A set of changes that seems to mitigate the audio cracking issues that I was experiencing when the system is massively over-commissioned.
Here's a summary of these changes:
select_cpu()
: after a task has been directly dispatched prevent to immediately migrate it on a different CPU, but force it to stay onprev_cpu
for another roundTested both on an Intel Core i7 and AMD Ryzen 7 5800X 8-Core with multiple games / audio workloads. On both systems these changes seem to improve the responsiveness both in terms of fps and audio quality.