scx_bpfland: primary domain #491

arighi · 2024-08-12T15:57:11Z

This adds the concept of "primary scheduling domain" to bpfland, that can be used to prioritize scheduling on a subset of cores. This is mostly focused at supporting hybrid architectures (systems with P-cores / E-cores).

The user can define the primary scheduling domain via the new option --primary-domain CPUMASK; if a primary domain is define the scheduler will try to dispatch tasks on the cores specified by CPUMASK, unless they become saturated, at which points tasks will be allowed to overflow on the other available cores.

This allows to define multiple performance profiles in hybrid systems, i.e., use a primary domain of e-cores only for better power consumption or a primary domain of p-cores only for better performance.

Some test results on an Dell Precision 5480 with 13th Gen Intel(R) Core(TM) i7-13800H hybrid cores:

                      +-----+-----+------+-----+----------+
                      | min | max | avg  |       |        |
                      | fps | fps | fps  | stdev | power  |
    +-----------------+-----+-----+------+-------+--------+
    | EEVDF           | 28  | 34  | 31.0 |  1.73 |  3.5W  |
    | bpfland-p-cores | 33  | 34  | 33.5 |  0.29 |  3.5W  |
    | bpfland-e-cores | 25  | 26  | 25.5 |  0.29 |  2.2W  |
    +-----------------+-----+-----+------+-------+--------+

multics69

I skimmed through the PR, and it looks great! Especially, I like the way you implement CpuMask and enable_cpu() for the initialization. I learned a new thing. :-)

arighi · 2024-08-13T07:47:08Z

Thanks for the review @multics69 , yeah the enable_cpu() stuff is pretty cool, something that I learned from @Byte-Lab :)

htejun · 2024-08-13T18:46:22Z

scheds/rust/scx_bpfland/src/bpf/main.bpf.c

+	struct bpf_cpumask *mask;
+	int err = 0;
+
+	bpf_rcu_read_lock();


Hmm... can you please explain the locking a bit? I don't understand what rcu read locks are protecting.

Thanks @htejun, the rcu read lock is definitely not needed, the goal was just to make the verifier happy.

I've pushed a new change removing the rcu locking both from the allowed_cpumask initialization and, more important, from pick_idle_cpu(). We still need it to call bpf_cpumask_set_cpu(), but that is just affecting the initialization, so basically there's not extra locking involved at runtime, which is nice.

Signed-off-by: Andrea Righi <[email protected]>

Abbreviate the statistics reported to stdout and remove the slice_ms metric: this metric can be easily derived from slice_ns, slice_ns_min and nr_wait, which is already reported to stdout. Signed-off-by: Andrea Righi <[email protected]>

Allow to specify a primary scheduling domain via the new command line option `--primary-domain CPUMASK`, where CPUMASK can be a hex number of arbitrary length, representing the CPUs assigned to the domain. If this option is not specified the scheduler will use all the available CPUs in the system as primary domain (no behavior change). Otherwise, if a primary scheduling domain is defined, the scheduler will try to dispatch tasks only to the CPUs assigned to the primary domain, until these CPUs are saturated, at which point tasks may overflow to other available CPUs. This feature can be used to prioritize certain cores over others and it can be really effective in systems with heterogeneous cores (e.g., hybrid systems with P-cores and E-cores). == Example (hybrid architecture) == Hardware: - Dell Precision 5480 with 13th Gen Intel(R) Core(TM) i7-13800H - 6 P-cores 0..5 with 2 CPUs each (CPU from 0..11) - 8 E-cores 6..13 with 1 CPU each (CPU from 12..19) == Test == WebGL application (https://webglsamples.org/aquarium/aquarium.html): this allows to generate a steady workload in the system without over-saturating the CPUs. Use different scheduler configurations: - EEVDF (default) - scx_bpfland using P-cores only (--primary-domain 0x00fff) - scx_bpfland using E-cores only (--primary-domain 0xff000) Measure performance (fps) and power consumption (W). == Result == +-----+-----+------+-----+----------+ | min | max | avg | | | | fps | fps | fps | stdev | power | +-----------------+-----+-----+------+-------+--------+ | EEVDF | 28 | 34 | 31.0 | 1.73 | 3.5W | | bpfland-p-cores | 33 | 34 | 33.5 | 0.29 | 3.5W | | bpfland-e-cores | 25 | 26 | 25.5 | 0.29 | 2.2W | +-----------------+-----+-----+------+-------+--------+ Using a primary scheduling domain of only P-cores with scx_bpfland allows to achieve a more stable and predictable level of performance, with an average of 33.5 fps and an error of ±0.5 fps. In contrast, using EEVDF results in an average frame rate of 31.0 fps with an error of ±3.0 fps, indicating slightly less consistency, due to the fact that tasks are evenly distributed across all the cores in the system (both slow and fast cores). On the other hand, using a scheduling domain solely of E-cores with scx_bpfland results in a lower average frame rate (25.5 fps), though it maintains a stable performance (error of ±0.5 fps), but the power consumption is also reduced, averaging 2.2W, compared to 3.5W with either of the other configurations. == Conclusion == In summary, with this change users have the flexibility to prioritize scheduling on performance cores for better performance and consistency, or prioritize energy efficient cores for reduced power consumption, on hybrid architectures. Moreover, this feature can also be used to minimize the number of cores used by the scheduler, until they reach full capacity. This capability can be useful for reducing power consumption even in homogeneous systems or for conducting scheduling experiments with smaller sets of cores, provided the system is not overcommitted. Signed-off-by: Andrea Righi <[email protected]>

hodgesds · 2024-08-14T15:17:54Z

scheds/rust/scx_bpfland/src/main.rs

+            .map_or(false, |&val| val & (1 << bit) != 0)
+    }
+
+    pub fn from_str(hex_str: &str) -> Result<Self, std::num::ParseIntError> {


in the future it might be nice to accept a --cpu-list similar to how taskset does it.

@hodgesds good idea. I'll add that as a future change, it can be done all in Rust, so it should be trivial, thanks for the suggestion!

multics69 reviewed Aug 13, 2024

View reviewed changes

htejun approved these changes Aug 13, 2024

View reviewed changes

arighi force-pushed the bpfland-primary-domain branch from 8e43175 to 67254e4 Compare August 14, 2024 06:48

arighi added 3 commits August 14, 2024 16:17

scx_bpfland: update copyright info

8656eff

Signed-off-by: Andrea Righi <[email protected]>

scx_bpfland: make output more compact

a6e977c

Abbreviate the statistics reported to stdout and remove the slice_ms metric: this metric can be easily derived from slice_ns, slice_ns_min and nr_wait, which is already reported to stdout. Signed-off-by: Andrea Righi <[email protected]>

arighi force-pushed the bpfland-primary-domain branch from 67254e4 to f9a9944 Compare August 14, 2024 14:18

hodgesds reviewed Aug 14, 2024

View reviewed changes

arighi merged commit d2ef2fc into main Aug 14, 2024
1 check passed

arighi deleted the bpfland-primary-domain branch August 14, 2024 16:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scx_bpfland: primary domain #491

scx_bpfland: primary domain #491

arighi commented Aug 12, 2024

multics69 left a comment

arighi commented Aug 13, 2024

htejun Aug 13, 2024

arighi Aug 14, 2024

hodgesds Aug 14, 2024

arighi Aug 14, 2024

scx_bpfland: primary domain #491

scx_bpfland: primary domain #491

Conversation

arighi commented Aug 12, 2024

multics69 left a comment

Choose a reason for hiding this comment

arighi commented Aug 13, 2024

htejun Aug 13, 2024

Choose a reason for hiding this comment

arighi Aug 14, 2024

Choose a reason for hiding this comment

hodgesds Aug 14, 2024

Choose a reason for hiding this comment

arighi Aug 14, 2024

Choose a reason for hiding this comment