Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_bpfland: primary domain #491

Merged
merged 3 commits into from
Aug 14, 2024
Merged

scx_bpfland: primary domain #491

merged 3 commits into from
Aug 14, 2024

Conversation

arighi
Copy link
Contributor

@arighi arighi commented Aug 12, 2024

This adds the concept of "primary scheduling domain" to bpfland, that can be used to prioritize scheduling on a subset of cores. This is mostly focused at supporting hybrid architectures (systems with P-cores / E-cores).

The user can define the primary scheduling domain via the new option --primary-domain CPUMASK; if a primary domain is define the scheduler will try to dispatch tasks on the cores specified by CPUMASK, unless they become saturated, at which points tasks will be allowed to overflow on the other available cores.

This allows to define multiple performance profiles in hybrid systems, i.e., use a primary domain of e-cores only for better power consumption or a primary domain of p-cores only for better performance.

Some test results on an Dell Precision 5480 with 13th Gen Intel(R) Core(TM) i7-13800H hybrid cores:

                      +-----+-----+------+-----+----------+
                      | min | max | avg  |       |        |
                      | fps | fps | fps  | stdev | power  |
    +-----------------+-----+-----+------+-------+--------+
    | EEVDF           | 28  | 34  | 31.0 |  1.73 |  3.5W  |
    | bpfland-p-cores | 33  | 34  | 33.5 |  0.29 |  3.5W  |
    | bpfland-e-cores | 25  | 26  | 25.5 |  0.29 |  2.2W  |
    +-----------------+-----+-----+------+-------+--------+

Copy link
Contributor

@multics69 multics69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed through the PR, and it looks great! Especially, I like the way you implement CpuMask and enable_cpu() for the initialization. I learned a new thing. :-)

@arighi
Copy link
Contributor Author

arighi commented Aug 13, 2024

Thanks for the review @multics69 , yeah the enable_cpu() stuff is pretty cool, something that I learned from @Byte-Lab :)

struct bpf_cpumask *mask;
int err = 0;

bpf_rcu_read_lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... can you please explain the locking a bit? I don't understand what rcu read locks are protecting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @htejun, the rcu read lock is definitely not needed, the goal was just to make the verifier happy.

I've pushed a new change removing the rcu locking both from the allowed_cpumask initialization and, more important, from pick_idle_cpu(). We still need it to call bpf_cpumask_set_cpu(), but that is just affecting the initialization, so basically there's not extra locking involved at runtime, which is nice.

Abbreviate the statistics reported to stdout and remove the slice_ms
metric: this metric can be easily derived from slice_ns, slice_ns_min
and nr_wait, which is already reported to stdout.

Signed-off-by: Andrea Righi <[email protected]>
Allow to specify a primary scheduling domain via the new command line
option `--primary-domain CPUMASK`, where CPUMASK can be a hex number of
arbitrary length, representing the CPUs assigned to the domain.

If this option is not specified the scheduler will use all the available
CPUs in the system as primary domain (no behavior change).

Otherwise, if a primary scheduling domain is defined, the scheduler will
try to dispatch tasks only to the CPUs assigned to the primary domain,
until these CPUs are saturated, at which point tasks may overflow to
other available CPUs.

This feature can be used to prioritize certain cores over others and it
can be really effective in systems with heterogeneous cores (e.g.,
hybrid systems with P-cores and E-cores).

== Example (hybrid architecture) ==

Hardware:
 - Dell Precision 5480 with 13th Gen Intel(R) Core(TM) i7-13800H
   - 6 P-cores 0..5  with 2 CPUs each (CPU from  0..11)
   - 8 E-cores 6..13 with 1 CPU  each (CPU from 12..19)

== Test ==

WebGL application (https://webglsamples.org/aquarium/aquarium.html):
this allows to generate a steady workload in the system without
over-saturating the CPUs.

Use different scheduler configurations:

 - EEVDF (default)
 - scx_bpfland using P-cores only (--primary-domain 0x00fff)
 - scx_bpfland using E-cores only (--primary-domain 0xff000)

Measure performance (fps) and power consumption (W).

== Result ==

                  +-----+-----+------+-----+----------+
                  | min | max | avg  |       |        |
                  | fps | fps | fps  | stdev | power  |
+-----------------+-----+-----+------+-------+--------+
| EEVDF           | 28  | 34  | 31.0 |  1.73 |  3.5W  |
| bpfland-p-cores | 33  | 34  | 33.5 |  0.29 |  3.5W  |
| bpfland-e-cores | 25  | 26  | 25.5 |  0.29 |  2.2W  |
+-----------------+-----+-----+------+-------+--------+

Using a primary scheduling domain of only P-cores with scx_bpfland
allows to achieve a more stable and predictable level of performance,
with an average of 33.5 fps and an error of ±0.5 fps.

In contrast, using EEVDF results in an average frame rate of 31.0 fps
with an error of ±3.0 fps, indicating slightly less consistency, due to
the fact that tasks are evenly distributed across all the cores in the
system (both slow and fast cores).

On the other hand, using a scheduling domain solely of E-cores with
scx_bpfland results in a lower average frame rate (25.5 fps), though it
maintains a stable performance (error of ±0.5 fps), but the power
consumption is also reduced, averaging 2.2W, compared to 3.5W with
either of the other configurations.

== Conclusion ==

In summary, with this change users have the flexibility to prioritize
scheduling on performance cores for better performance and consistency,
or prioritize energy efficient cores for reduced power consumption, on
hybrid architectures.

Moreover, this feature can also be used to minimize the number of cores
used by the scheduler, until they reach full capacity. This capability
can be useful for reducing power consumption even in homogeneous systems
or for conducting scheduling experiments with smaller sets of cores,
provided the system is not overcommitted.

Signed-off-by: Andrea Righi <[email protected]>
.map_or(false, |&val| val & (1 << bit) != 0)
}

pub fn from_str(hex_str: &str) -> Result<Self, std::num::ParseIntError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the future it might be nice to accept a --cpu-list similar to how taskset does it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hodgesds good idea. I'll add that as a future change, it can be done all in Rust, so it should be trivial, thanks for the suggestion!

@arighi arighi merged commit d2ef2fc into main Aug 14, 2024
1 check passed
@arighi arighi deleted the bpfland-primary-domain branch August 14, 2024 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants