Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_utils: Add GPU topology #575

Merged
merged 1 commit into from
Aug 28, 2024
Merged

scx_utils: Add GPU topology #575

merged 1 commit into from
Aug 28, 2024

Conversation

hodgesds
Copy link
Contributor

Add GPU awareness to the topology crate.

tested on non gpu machine:

$ sudo ./bin_local/bin/scx_layered  f:user.json 
21:43:37 [INFO] CPUs: online/possible=80/80 nr_cores=40
GPUS: []
21:43:37 [INFO] configuring node 0, LLCs 1
21:43:37 [INFO] configuring llc 0 for node 0
21:43:37 [INFO] configuring node 1, LLCs 1
21:43:37 [INFO] configuring llc 1 for node 1

gpu machine:

$ ./scx_layered f:layered.json 
21:44:25 [INFO] CPUs: online/possible=224/224 nr_cores=112
GPUS: [GPU { id: 0, node_id: 0, max_graphics_clock: 1980, max_sm_clock: 1980, memory: 102625181696 }, GPU { id: 1, node_id: 0, max_graphics_clock: 1980, max_sm_clock: 1980, memory: 102625181696 }, GPU { id: 2, node_id: 0, max_graphics_clock: 1980, max_sm_clock: 1980, memory: 102625181696 }, GPU { id: 3, node_id: 0, max_graphics_clock: 1980, max_sm_clock: 1980, memory: 102625181696 }, GPU { id: 4, node_id: 0, max_graphics_clock: 1980, max_sm_clock: 1980, memory: 102625181696 }, GPU { id: 5, node_id: 0, max_graphics_clock: 1980, max_sm_clock: 1980, memory: 102625181696 }, GPU { id: 6, node_id: 0, max_graphics_clock: 1980, max_sm_clock: 1980, memory: 102625181696 }, GPU { id: 7, node_id: 0, max_graphics_clock: 1980, max_sm_clock: 1980, memory: 102625181696 }]

@@ -217,10 +220,48 @@ impl Cache {
}
}

#[derive(Debug, Clone)]
pub struct GPU {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do Gpu? We use Cpu in other places.

max_graphics_clock: usize,
// AMD uses CU for this value
max_sm_clock: usize,
memory: u64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just pub these fields instead of adding accessors? That way, it's a lot easier to e.g. unpack the fields in match arms.

for node in &self.nodes {
gpus.extend(node.gpus.values().clone());
}
gpus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be a bit misleading to give out vector which isn't indexed by IDs for entities w/ IDs. Maybe provide an iter or return BTreeMap instead? The IDs are supposed to be unique system-wide, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IDs are supposed to be unique system-wide, right?

I wasn't 100% sure on that, especially if a system has a mix of NVIDIA/AMD GPUs. For the NVIDIA case it's the same id that is used by device_by_index. Maybe it's better to use the PCIe bus id instead? I think it would still work with the NVL helpers as well. My thought is that any scheduler specific use cases that need extra data should still be able to lookup the device by the id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.... it does node_gpus.insert(gpu.id(), gpu.clone());, so it does assume the the ID is unique at least in the node. If nvidia/amd may overlap, maybe the ID should be an enum - ie. Vendor(u64)? PCI ID is fine too but can be a bit unwieldy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the enum idea, will try that out!

so it does assume the the ID is unique at least in the node

Yeah, wasn't sure on that either. There's not great AMD libraries it seems, so maybe this is a problem for the future, but should try to do it right the first time.

Copy link
Contributor

@arighi arighi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool, I'll do some tests with this if I can find some beefy NVIDIA machines tomorrow. Thanks for working on this!

@multics69
Copy link
Contributor

This is cool! Does AMD GPU also support something similar with nvml-wrapper?

Add GPU awareness to the topology crate.

Signed-off-by: Daniel Hodges <[email protected]>
@hodgesds
Copy link
Contributor Author

Had to fix the numa node lookup as it was incorrect, but it looks good now:

13:30:46 [INFO] CPUs: online/possible=224/224 nr_cores=112
GPUS: {Nvidia { nvml_id: 0 }: Gpu { index: Nvidia { nvml_id: 0 }, node_id: 0, max_graphics_clock: 1980, max_sm_clock: 1980, memory: 102625181696 }, Nvidia { nvml_id: 1 }: Gpu { index: Nvidia { nvml_id: 1 }, node_id: 0, max_graphics_clock: 1980, max_sm_clock: 1980, memory: 102625181696 }, Nvidia { nvml_id: 2 }: Gpu { index: Nvidia { nvml_id: 2 }, node_id: 0, max_graphics_clock: 1980, max_sm_clock: 1980, memory: 102625181696 }, Nvidia { nvml_id: 3 }: Gpu { index: Nvidia { nvml_id: 3 }, node_id: 0, max_graphics_clock: 1980, max_sm_clock: 1980, memory: 102625181696 }, Nvidia { nvml_id: 4 }: Gpu { index: Nvidia { nvml_id: 4 }, node_id: 1, max_graphics_clock: 1980, max_sm_clock: 1980, memory: 102625181696 }, Nvidia { nvml_id: 5 }: Gpu { index: Nvidia { nvml_id: 5 }, node_id: 1, max_graphics_clock: 1980, max_sm_clock: 1980, memory: 102625181696 }, Nvidia { nvml_id: 6 }: Gpu { index: Nvidia { nvml_id: 6 }, node_id: 1, max_graphics_clock: 1980, max_sm_clock: 1980, memory: 102625181696 }, Nvidia { nvml_id: 7 }: Gpu { index: Nvidia { nvml_id: 7 }, node_id: 1, max_graphics_clock: 1980, max_sm_clock: 1980, memory: 102625181696 }}

@hodgesds
Copy link
Contributor Author

This is cool! Does AMD GPU also support something similar with nvml-wrapper?

It does, I don't know how good the bindings are though. I found this one and will test it on some hardware when I'm at home.

pub struct Gpu {
pub index: GpuIndex,
pub node_id: usize,
pub max_graphics_clock: usize,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields probably need some standardized units appended to them at some point...

@hodgesds hodgesds merged commit 5391816 into sched-ext:main Aug 28, 2024
2 checks passed
@hodgesds hodgesds deleted the gpu-topo branch August 28, 2024 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants