Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_rusty: load balancing fixes #260

Merged
merged 2 commits into from
May 2, 2024

Conversation

danieljordan10
Copy link
Contributor

Fix 1:

scx_rusty: make per-task loads sensitive to lb_apply_weight

Rusty's load balancer calculates load differently based on average system CPU utilization in create_domain_hierarchy(). At >= 99.999% utilization, load is the product of a task's weight and duty cycle; below that, load is the same as the task's duty cycle.

populate_tasks_by_load(), however, always uses the product when calculating per-task load so that in the sub-99.999% util case, load is inflated, typically by a factor of 100 with a normal priority task. Tasks look too heavy to migrate as a result because a single task would transfer more load than the domain imbalance allows, leading to significant imbalance in some cases.

Make populate_tasks_by_load() calculate task load the same way as domain load, checking lb_apply_weight.

Here's a CPU utilization heatmap from an EPYC server before the fix. Each pair of CPU rows are SMT siblings, every 16 CPUs are an LLC, and the top and bottom halves are nodes 0 and 1. The total time is 2 minutes, and each square is a 1 second sample.

64users_iter1_240430_210719 mpstat %busy_small

Here's the same workload after the fix:

64users_iter1_240501_115702 mpstat %busy_small

Fix 2:

scx_rusty: compare abs values in xfer_between()

A LoadEntity gets the load to transfer between two entities by taking the minimum of their imbalances and reducing its abs value by xfer_ratio.

In practice self.imbal(), the push node or domain, always has positive imbalance and other.imbal(), the pull node or domain, always has negative imbalance, so other.imbal() is always the minimum even though the abs value of its imbalance might be greater than the abs value of self.imbal(). It seems like the intent is to take the minimum of the two absolute values instead to avoid overbalancing at the puller, so make both values abs.

Rusty's load balancer calculates load differently based on average
system CPU utilization in create_domain_hierarchy().  At >= 99.999%
utilization, load is the product of a task's weight and duty cycle;
below that, load is the same as the task's duty cycle.

populate_tasks_by_load(), however, always uses the product when
calculating per-task load so that in the sub-99.999% util case, load is
inflated, typically by a factor of 100 with a normal priority task.
Tasks look too heavy to migrate as a result because a single task would
transfer more load than the domain imbalance allows, leading to
significant imbalance in some cases.

Make populate_tasks_by_load() calculate task load the same way as
domain load, checking lb_apply_weight.

Signed-off-by: Daniel Jordan <[email protected]>
A LoadEntity gets the load to transfer between two entities by taking
the minimum of their imbalances and reducing its abs value by
xfer_ratio.

In practice self.imbal(), the push node or domain, always has positive
imbalance and other.imbal(), the pull node or domain, always has
negative imbalance, so other.imbal() is always the minimum even though
the abs value of its imbalance might be greater than the abs value of
self.imbal().  It seems like the intent is to take the minimum of the
two absolute values instead to avoid overbalancing at the puller, so
make both values abs.

Signed-off-by: Daniel Jordan <[email protected]>
Copy link
Contributor

@Byte-Lab Byte-Lab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, thanks a lot for noticing and fixing this. I expect I never noticed this because of the greedy stealing logic.

@Byte-Lab Byte-Lab merged commit 8fe900e into sched-ext:main May 2, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants