-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCT/CUDA: treat stitched VA as managed memory #10459
base: master
Are you sure you want to change the base?
UCT/CUDA: treat stitched VA as managed memory #10459
Conversation
@SeyedMir Can you please test these on x86 platforms and see if expected protocols are being chosen? |
base_address = (CUdeviceptr)address; | ||
alloc_length = length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe goto out_default_range
instead?
also some rephactoring could be done, e.g. to have a trace for all cases including the new one
base_address = (CUdeviceptr)address; | ||
alloc_length = length; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is cuMemGetAddressRange
support for VMM allocations documented somewhere? The driver API docs mention legacy allocators only.
Returns the base address in *pbase and size in *psize of the allocation by cuMemAlloc() or cuMemAllocPitch()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed not well documented. The behavior we've seen is that it is able to provide base pointer and size corresponding to base address and length of mapped address space from the corresponding cuMemMap.
@@ -689,6 +689,13 @@ uct_cuda_copy_md_query_attributes(uct_cuda_copy_md_t *md, const void *address, | |||
return UCS_ERR_INVALID_ADDR; | |||
} | |||
|
|||
if ((uintptr_t)base_address + alloc_length < (uintptr_t)address + length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ((uintptr_t)base_address + alloc_length < (uintptr_t)address + length) { | |
if (UCS_PTR_BYTE_OFFSET(base_address, alloc_length) < | |
UCS_PTR_BYTE_OFFSET(address, length)) { |
Some test failures seem relevant |
What?
Detect VA ranges composed of multiple physical allocations and treat as managed memory to force pipeline protocols.
Initial tests on nvlink connected GPUs show protocols being selected correctly: