-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scx_bpfland: primary domain #491
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed through the PR, and it looks great! Especially, I like the way you implement CpuMask and enable_cpu() for the initialization. I learned a new thing. :-)
Thanks for the review @multics69 , yeah the enable_cpu() stuff is pretty cool, something that I learned from @Byte-Lab :) |
struct bpf_cpumask *mask; | ||
int err = 0; | ||
|
||
bpf_rcu_read_lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... can you please explain the locking a bit? I don't understand what rcu read locks are protecting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @htejun, the rcu read lock is definitely not needed, the goal was just to make the verifier happy.
I've pushed a new change removing the rcu locking both from the allowed_cpumask
initialization and, more important, from pick_idle_cpu()
. We still need it to call bpf_cpumask_set_cpu()
, but that is just affecting the initialization, so basically there's not extra locking involved at runtime, which is nice.
8e43175
to
67254e4
Compare
Signed-off-by: Andrea Righi <[email protected]>
Abbreviate the statistics reported to stdout and remove the slice_ms metric: this metric can be easily derived from slice_ns, slice_ns_min and nr_wait, which is already reported to stdout. Signed-off-by: Andrea Righi <[email protected]>
Allow to specify a primary scheduling domain via the new command line option `--primary-domain CPUMASK`, where CPUMASK can be a hex number of arbitrary length, representing the CPUs assigned to the domain. If this option is not specified the scheduler will use all the available CPUs in the system as primary domain (no behavior change). Otherwise, if a primary scheduling domain is defined, the scheduler will try to dispatch tasks only to the CPUs assigned to the primary domain, until these CPUs are saturated, at which point tasks may overflow to other available CPUs. This feature can be used to prioritize certain cores over others and it can be really effective in systems with heterogeneous cores (e.g., hybrid systems with P-cores and E-cores). == Example (hybrid architecture) == Hardware: - Dell Precision 5480 with 13th Gen Intel(R) Core(TM) i7-13800H - 6 P-cores 0..5 with 2 CPUs each (CPU from 0..11) - 8 E-cores 6..13 with 1 CPU each (CPU from 12..19) == Test == WebGL application (https://webglsamples.org/aquarium/aquarium.html): this allows to generate a steady workload in the system without over-saturating the CPUs. Use different scheduler configurations: - EEVDF (default) - scx_bpfland using P-cores only (--primary-domain 0x00fff) - scx_bpfland using E-cores only (--primary-domain 0xff000) Measure performance (fps) and power consumption (W). == Result == +-----+-----+------+-----+----------+ | min | max | avg | | | | fps | fps | fps | stdev | power | +-----------------+-----+-----+------+-------+--------+ | EEVDF | 28 | 34 | 31.0 | 1.73 | 3.5W | | bpfland-p-cores | 33 | 34 | 33.5 | 0.29 | 3.5W | | bpfland-e-cores | 25 | 26 | 25.5 | 0.29 | 2.2W | +-----------------+-----+-----+------+-------+--------+ Using a primary scheduling domain of only P-cores with scx_bpfland allows to achieve a more stable and predictable level of performance, with an average of 33.5 fps and an error of ±0.5 fps. In contrast, using EEVDF results in an average frame rate of 31.0 fps with an error of ±3.0 fps, indicating slightly less consistency, due to the fact that tasks are evenly distributed across all the cores in the system (both slow and fast cores). On the other hand, using a scheduling domain solely of E-cores with scx_bpfland results in a lower average frame rate (25.5 fps), though it maintains a stable performance (error of ±0.5 fps), but the power consumption is also reduced, averaging 2.2W, compared to 3.5W with either of the other configurations. == Conclusion == In summary, with this change users have the flexibility to prioritize scheduling on performance cores for better performance and consistency, or prioritize energy efficient cores for reduced power consumption, on hybrid architectures. Moreover, this feature can also be used to minimize the number of cores used by the scheduler, until they reach full capacity. This capability can be useful for reducing power consumption even in homogeneous systems or for conducting scheduling experiments with smaller sets of cores, provided the system is not overcommitted. Signed-off-by: Andrea Righi <[email protected]>
67254e4
to
f9a9944
Compare
.map_or(false, |&val| val & (1 << bit) != 0) | ||
} | ||
|
||
pub fn from_str(hex_str: &str) -> Result<Self, std::num::ParseIntError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the future it might be nice to accept a --cpu-list
similar to how taskset
does it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hodgesds good idea. I'll add that as a future change, it can be done all in Rust, so it should be trivial, thanks for the suggestion!
This adds the concept of "primary scheduling domain" to bpfland, that can be used to prioritize scheduling on a subset of cores. This is mostly focused at supporting hybrid architectures (systems with P-cores / E-cores).
The user can define the primary scheduling domain via the new option
--primary-domain CPUMASK
; if a primary domain is define the scheduler will try to dispatch tasks on the cores specified byCPUMASK
, unless they become saturated, at which points tasks will be allowed to overflow on the other available cores.This allows to define multiple performance profiles in hybrid systems, i.e., use a primary domain of e-cores only for better power consumption or a primary domain of p-cores only for better performance.
Some test results on an Dell Precision 5480 with 13th Gen Intel(R) Core(TM) i7-13800H hybrid cores: