-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make scx_rusty interactive #261
Conversation
Let's remove the extraneous copy pasting and use a lookup helper like we do for task and pcpu context. Signed-off-by: David Vernet <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overall looks fantastic. Thanks for the excellent work.
Given that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it looks great to me. Thanks for the excellent work! I am glad that the high-level idea of LAVD was adopted, although, of course, the details are different. Also, I like the idea of considering the task's nice weight when calculating latency priority. I left a few comments asking for further clarification of high-level ideas behind some logic.
Tested on 7950X3D and provides a massive improvement in interactivity. Benchmarks are also looking good compared to EEVDF. |
scx_rusty doesn't do terribly well with interactive workloads. In order to improve the situation, this patch adds support for basic deadline scheduling in rusty. This approach doesn't incorporate eligibility, and simply uses a crude avg_runtime tracking approach to scaling a task's deadline. In a series of follow-on changes, we'll update the scheduler to use more indicators for interactivity that affect both slice length, and deadline calculation. Signed-off-by: David Vernet <[email protected]>
In user space in rusty, the tuner detects system utilization, and uses it to inform how we do load balancing, our greedy / direct cpumasks, etc. Something else we could be doing but currently aren't, is using system utilization to inform how we dispatch tasks. We currently have a static, unchanging slice length for the runtime of the program, but this is inefficient for all scenarios. Giving a task a long slice length does have advantages, such as decreasing the number of involuntary context switches, decreasing the overhead of preemption by doing it less frequently, possibly getting better cache locality due to a task running on a CPU for a longer amount of time, etc. On the other hand, long slices can be problematic as well. When a system is highly utilized, a CPU-hogging task running for too long can harm interactive tasks. When the system is under-utilized, those interactive tasks can likely find an idle, or under-utilized core to run on. When the system is over-utilized, however, they're likely to have to park in a runqueue. Thus, in order to better accommodate such scenarios, this patch implements a rudimentary slice scaling mechanism in scx_rusty. Rather than having one global, static slice length, we instead have a dynamic, global slice length that can be changed depending on system utilization. When over-utilized, we go with a longer slice length, and vice versa for when the system is under-utilized. With Terraria, this results in roughly a 50% improvement in mean FPS when playing on an AMD Ryzen 9 7950X, while running Spotify, and stress-ng -c $((4 * $(nproc))). Signed-off-by: David Vernet <[email protected]>
017d8cb
to
2403f60
Compare
Overview
This is the first iteration that attempts to make
scx_rusty
more accommodating to interactive workloads. The patch set does the following:scx_rusty
a deadline scheduler, rather than a simple vtime scheduler. The deadline is calculated according to a number of factors:scx_rusty
tracks a task's average runtime, and scales its deadline inversely according to its weight.scx_rusty
also tracks the frequency with which a task is blocked, and the frequency with which a task wakes other tasks. In the former case, the task is likely to be a consumer, and in the latter case, a producer (or both if the frequency is high for both). We calculate alat_prio
value for tasks when setting a deadline, which is inversely scaled by a task's block and waker frequency, and positively scaled by a task's average runtime.While we could almost certainly go a lot further with calculating tasks' lat_prio values, this already performs quite well. More on this below.
scx_rusty
to have a dynamic slice length. A longer slice length is used for under-utilized hosts, and a much shorter slice length is used for over-utilized hosts. These under/over util slice length values can be set when the scheduler is loaded, but are static thereafter. The scheduler will track utilization, and will adjust to use one or the other slice length depending on that util.This is another example where we could almost certainly go further. For example, we could track slice length as a per-task construct, and e.g. throttle a task's slice when we determine that its average runtime is too high. For now, this also gives us very good results.
Results
scx_rusty
seems to perform remarkably well on common interactive workloads. I'll describe some of the benchmarks I ran below.All benchmarks were run on a Ryzen 9 7950X. Each benchmark (unless stated otherwise) was run concurrently with an active Spotify session, as well as severe CPU contention via
$ stress-ng -c $((4 * $(nproc)))
:Terraria
Running terraria with
scx_rusty
under the above conditions results in roughly a 60-70% FPS improvement compared to EEVDF (on v6.8). See the following video for a demonstration: https://drive.google.com/file/d/1fyHt9BYGha6apl7HAkibwpy52UTi8-AQ/view?usp=sharingCivilization 6
Running the standard, CPU-bound AI benchmark on Civilization 6 resulted in roughly a 2.5x improvement with
scx_rusty
over EEVDF:EEVDF:
scx_rusty
:If run without overcommitting the host, both schedulers appeared to perform equally.
kcompile
Finally, I also tested doing a kcompile while the system is severely overutilized:
EEVDF:
scx_rusty
:Only a sample size of one so not statistically significant, but at least indicative that this doesn't cause a regression.
Future work
Here are some ideas for how this patch set could be expanded upon in the future:
scx_rusty
preempting to even further help interactive tasks