-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scx_rusty: Add mempolicy checks to rusty #364
Conversation
a23cca6
to
60fa13e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely think this will be useful, thanks for working on it. Putting it back into your queue for now per our discussion.
static u32 task_pick_mempolicy_domain( | ||
struct task_struct *p, struct task_ctx *taskc, u32 rr_token) | ||
{ | ||
u32 ret = NO_DOM_FOUND; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Just to match kernel coding style, could you please add a newline between variable declarations and the rest of the function, and move all variable declarations to the top of whatever scope they're declared in?
if (ret != NO_DOM_FOUND && i % rr_token == 0) | ||
return ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoke offline -- we're matching NUMA node with domain ID here, but they might not match up if there are multiple CCXs per node. We can instead have something like a mask of preferred domains (that's informed by p->mempolicy->nodes
), and then query that when looking for a domain when we do round robin in the caller, and in the load balancer.
607c4e4
to
3a9e468
Compare
Some more testing results. It looks like a slight dip in
CFS:
Tested on a multi NUMA machine with:
Sorry for the formatting issues! |
@@ -753,8 +755,18 @@ impl<'a, 'b> LoadBalancer<'a, 'b> { | |||
skip_kworkers: bool, | |||
) -> Option<&'d TaskInfo> | |||
where | |||
I: IntoIterator<Item = &'d TaskInfo>, | |||
I: IntoIterator<Item = &'d TaskInfo> + Clone, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ok because it's only cloning the iterator, but I may be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, honestly not sure.
9d8484d
to
94dee33
Compare
Pushed up a fix where only the first preferred node map was being used. Similar results afterwards:
|
94dee33
to
a080dcd
Compare
u32 dom_id = 0; | ||
|
||
bpf_for(dom_id, 0, nr_doms) { | ||
if (!(dom_node_id(dom_id) == node_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we do if (dom_node_id(dom_id) != node_id)
* Sets the preferred domain mask according to the mempolicy. See man(2) | ||
* set_mempolicy for more details on mempolicy. | ||
*/ | ||
static int task_set_preferred_mempolicy_dom_mask(struct task_struct *p, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we change the return value to bool
? or maybe even simpler would be to make it void
, and just check if preferred_dom_mask
is nonzero in the caller?
if (taskc->preferred_dom_mask <= 0) | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, let's maybe just do == 0
? We really just case about whether there are any bits set, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was just being lazy with cleaning up those function signatures. Good call!
if (taskc->preferred_dom_mask <= 0) | ||
continue; | ||
|
||
if ((1LLU << dom) & taskc->preferred_dom_mask != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add parens around the bitwise comparison?
|
||
if (cpu < 0 || cpu >= MAX_CPUS) | ||
return NO_DOM_FOUND; | ||
|
||
taskc->dom_mask = 0; | ||
taskc->preferred_dom_mask = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we bring this into task_set_preferred_mempolicy_dom_mask
?
if (has_preferred_dom < 0) | ||
continue; | ||
|
||
if (((1LLU << dom) & taskc->preferred_dom_mask)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also check && preferred_dom == NO_DOM_FOUND
?
Edit: Changed from != NO_DOM_FOUND
to == NO_DOM_FOUND
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm @hodgesds, looks like we still need this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked this, since preferred_dom
is initialized to NO_DOM_FOUND
it would always fail that check I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reading comprehension > me.... yeah that's good!
{ | ||
// First try to find a task in the preferred domain mask. | ||
if let Some(task) = tasks_by_load.clone().into_iter().take_while( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only useful if the the mempolicy stuff has been used in the BPF prog right? Should we maybe first check that preferred_dom_mask is nonzero so we can avoid the clone
?
The other thing is that we may want to do this in two passes. If we do it this way, we may end up:
- Successfully finding a task that matches the
preferred_dom_mask
, but; - Does not improve load imbalance to the point where it's chosen to migrate, and therefore;
- Failing to find a task that does sufficiently address load imbalance, but didn't match on
preferred_dom_mask
.
So in other words, rather than first trying to find a task in the pull_dom that matches the preferred mask, and then skipping looking at the rest of the tasks if we do find one, we should do the whole load imbalance calculation logic again in try_to_find_task()
after (2) above, but ignoring the preferred_dom_mask
. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense. How does this sound:
If a filter lambda passed to try_find_move_task
it becomes easy to check the return value in the loop where tasks are transferred and if that is empty then remove the preferred_dom_mask
filter and retry the move as you mentioned.
fn try_find_move_task(
&mut self,
(push_dom, to_push): (&mut Domain, f64),
(pull_dom, to_pull): (&mut Domain, f64),
+ task_filter: impl Fn(& TaskInfo) -> bool,
to_xfer: f64,
) -> Result<Option<f64>> {
I guess the current approach kind of cheats by breaking the load balancing.
@@ -753,8 +755,18 @@ impl<'a, 'b> LoadBalancer<'a, 'b> { | |||
skip_kworkers: bool, | |||
) -> Option<&'d TaskInfo> | |||
where | |||
I: IntoIterator<Item = &'d TaskInfo>, | |||
I: IntoIterator<Item = &'d TaskInfo> + Clone, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, honestly not sure.
a080dcd
to
7eca676
Compare
This change makes scx_rusty mempolicy aware. When a process uses set_mempolicy it can change NUMA memory preferences and cause performance issues when tasks are scheduled on remote NUMA nodes. This change modifies task_pick_domain to use the new helper method that returns the preferred node id. Signed-off-by: Daniel Hodges <[email protected]>
Signed-off-by: Daniel Hodges <[email protected]>
7eca676
to
632b706
Compare
if (taskc->preferred_dom_mask == 0 | ||
&& preferred_dom != NO_DOM_FOUND) | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be ||?
This change refactors some of the helper methods for getting the preferred node for tasks using mempolicy. The load balancing logic in try_find_move_task is updated to allow for a filter, which is used to filter for tasks with a preferred mempolicy. Signed-off-by: Daniel Hodges <[email protected]>
632b706
to
27122a8
Compare
@@ -879,9 +888,19 @@ impl<'a, 'b> LoadBalancer<'a, 'b> { | |||
pull_node.domains.insert(pull_dom); | |||
break; | |||
} | |||
let transferred = self.try_find_move_task((&mut push_dom, push_imbal), | |||
let mut transferred = self.try_find_move_task((&mut push_dom, push_imbal), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a good idea to avoid doing this work altogether if preferred_dom_mask
is zero, as that will probably be the common case (right?). But this is fine for now, we can address this in a follow-on change if we choose. Thanks!
This change makes scx_rusty mempolicy aware. When a process uses
set_mempolicy
it can change NUMA memory preferences and cause performance issues when tasks are scheduled on remote NUMA nodes. This change modifies task_pick_domain to use the new helper method that returns the preferred node id.With the
--mempolicy-affinity
flag set:scx_rusty
default:cfs:
The
bigheap
benchmark sees a moderate improvement, where most everything else is flat or worse. So maybe this flag makes sense if something is using mbind with lots of allocations.