Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_rustland: improve audio workload and performance predictability #307

Merged
merged 4 commits into from
May 23, 2024

Conversation

arighi
Copy link
Contributor

@arighi arighi commented May 22, 2024

A set of changes that seems to mitigate the audio cracking issues that I was experiencing when the system is massively over-commissioned.

Here's a summary of these changes:

  • dispatch interactive tasks on the first CPU available and non-interactive tasks on the CPU selected by the idle selection logic
  • assign the same time slice both to interactive and non-interactive tasks
  • implement a second-chance migration in select_cpu(): after a task has been directly dispatched prevent to immediately migrate it on a different CPU, but force it to stay on prev_cpu for another round

Tested both on an Intel Core i7 and AMD Ryzen 7 5800X 8-Core with multiple games / audio workloads. On both systems these changes seem to improve the responsiveness both in terms of fps and audio quality.

Andrea Righi added 3 commits May 22, 2024 12:12
Do not always assign the maximum time slice to interactive tasks, but
use the same value of the dynamic time slice for everyone.

This seems to prevent potential audio cracking when the system is over
commissioned.

Signed-off-by: Andrea Righi <[email protected]>
The shared DSQ is typically used to prioritize tasks and dispatch them
on the first CPU available, so consume from the shared DSQ before the
local CPU DSQ.

Signed-off-by: Andrea Righi <[email protected]>
Dispatch non-interactive tasks on the CPU selected by the built-in idle
selection logic and allow interactive tasks to be dispatched on any CPU.

Signed-off-by: Andrea Righi <[email protected]>
* advantage of the cached working set.
*/
if (scx_bpf_test_and_clear_cpu_idle(prev_cpu)) {
tctx->dispatch_local = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, you can now directly call scx_bpf_dispatch() from select_cpu(), so you don' t have to set dispatch_local to tell enqueue().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can definitely keep the flag to track the direct dispatches, and then dispatch directly from select_cpu()... I guess my brain is just too used to the "old" dispatch_local way :)

* If the task was directly dispatched give it a second chance to
* remain on the current CPU, instead of immediately migrating it.
*/
if (tctx->dispatch_local) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then maybe this can be renamed to clarify that it's tracking what happened the last time? Not a big difference but might make it a bit easier to follow.

Overall, yeah, other schedulers too are a lot more eager to move tasks across CPU boundaries than the kernel's. This has some benefits in terms of work conservation but at the expense of L1/2 locality. This probably is an area other schedulers can improve on too - e.g. consider overall saturation level of the system, how long the prev CPU has been busy since the waking task ran on it the last time (as a proxy for measuring cache decay), and how soon the CPU is expected to be freed up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about renaming the variable to clarify that it's basically used to keep track of the previous direct dispatch. Will change it and clarify also the comment.

About being less aggressive with migrations, it'd be nice also to add some topology awareness to the equation, like try to stick not just on prev_cpu, but on a subset of CPUs that share the L2 cache with prev_cpu, for example. Maybe that's something that we could even add to the in-kernel idle selection logic to improve all schedulers in general?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, teaching some topology awareness to select_cpu_dfl() is something on the todo list.

Implement a second-chance migration in select_cpu(): after a task has
been dispatched directly do not try to migrate it immediately on a
different CPU, but force it to stay on prev_cpu for another round.

This seems quite effective on certain architectures (such as on a system
with 11th Gen Intel(R) Core(TM) i7-1195G7 @ 2.90GHz), and it can provide
noticeable benefits with gaming or WebGL applications (such as
https://webglsamples.org/aquarium/aquarium.html) under regular workload
conditions (around +5% fps).

Signed-off-by: Andrea Righi <[email protected]>
@arighi
Copy link
Contributor Author

arighi commented May 22, 2024

Alright, renamed the flag dispatch_local -> allow_migration, changing the logic and updating the comment to make its purpose more clear, and moved all the direct dispatches to select_cpu().

@arighi arighi merged commit 3e2e581 into main May 23, 2024
1 check passed
@arighi arighi deleted the rustland-improve-audio branch May 23, 2024 05:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants