-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scx_bpfland: enhance performance consistency and predictability #410
Conversation
Small refactoring of the idle CPU selection logic: - optimize idle CPU selection for tasks that can run on a single CPU - drop the built-in idle selection policy and completely rely on the custom one Signed-off-by: Andrea Righi <[email protected]>
Always use direct CPU dispatch for kthreads, there is no need to treat kthreads in a special way, simply reuse direct CPU dispatch to prioritize them. Moreover, change direct CPU dispatches to use scx_bpf_dispatch_vtime(), since we may dispatch multiple tasks to the same per-CPU DSQ now. Signed-off-by: Andrea Righi <[email protected]>
I've been testing it for a few days. Everything works very well. |
Also tested by me. Performance seems very good as well as reliability and interactivity. |
struct bpf_cpumask *mask; | ||
int ret = false; | ||
|
||
bpf_rcu_read_lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't necessarily from this PR but what is RCU read lock protecting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.. I had the impression that the verifier was complaining without the RCU protection, but it's not the case (and I also don't see anything to protect here). Probably it was needed to protect something else that was moved after some refactoring. Anyway, thanks for noticing it, I'll drop the RCU read lock here!
* required to schedule a regular task. This ensures that regular tasks are not | ||
* starved by interactive tasks. | ||
*/ | ||
const volatile u64 starvation_thresh = 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Down the line, it may lead to more consistent behavior if the mechanism is time based rather than the number of events. Another thing to consider is that if there are tasks which are affine to some subset of CPUs, they might still be able to starve, so maybe a mechanism which can transfer timing out batch tasks to per-cpu DSQs will probably be more fool-proof. Just ideas for possible future improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a time-based approach would definitely be more effective. I'll work on that.
I also like the idea of transferring tasks to per-CPU DSQs. That's something I coud use also for the CPU hotplug support, to transfer tasks to another DSQ from the .cpu_offline()
callback, but I haven't found a way to pop tasks from a non-local DSQ and move them to another DSQ, except via scx_bpf_consume()
, but this doesn't give me the task, it just "internally" transfers it to the local DSQ and of course I can't use consume in cpu_offline()
...
Signed-off-by: Andrea Righi <[email protected]>
Reduce the default time slice down to 5ms for a faster reaction and system responsiveness when the system is overcomissioned. This also helps to provide a more predictable level of performance. Signed-off-by: Andrea Righi <[email protected]>
There is no need to RCU protect the cpumask for the offline CPUs: it is created once when the scheduler is initialized and it's never deallocated. Signed-off-by: Andrea Righi <[email protected]>
e2d8d82
to
47ac74b
Compare
Removed the unnecessary RCU read lock protection + implemented a better time-based starvation prevention. Tested with the usual gaming + building kernel, everything looks good so far (better than before apparently). |
Tasks are consumed from various DSQs in the following order: per-CPU DSQs => priority DSQ => shared DSQ Tasks in the shared DSQ may be starved by those in the priority DSQ, which in turn may be starved by tasks dispatched to any per-CPU DSQ. To mitigate this, record the timestamp of the last task scheduling event both from the priority DSQ and the shared DSQ. If the starvation threshold is exceeded without consuming a task, the scheduler will be forced to consume a task from the corresponding DSQ. The starvation threshold can be adjusted using the --starvation-thresh command line parameter (default is 5ms). Signed-off-by: Andrea Righi <[email protected]>
47ac74b
to
cfe2ed0
Compare
Merging this as it seems pretty stable and well-tested. |
A set of changes to make performance more consistent and prevent spikes:
SCX_DSQ_LOCAL
)Thanks to the CachyOS community that provided an extensive amount of test results and feedback about these changes.