Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix stream handling in multi-op device tasks #421

Merged
merged 13 commits into from
Sep 29, 2023

Conversation

evaleev
Copy link
Member

@evaleev evaleev commented Sep 25, 2023

resolves #420 and #422

@evaleev evaleev force-pushed the 420-stream-assignment-to-device-tasks-should-be-sticky branch from 1393bc3 to b4b5997 Compare September 26, 2023 19:58
@evaleev evaleev force-pushed the 420-stream-assignment-to-device-tasks-should-be-sticky branch from ca59376 to 5ea446b Compare September 27, 2023 15:35
@evaleev
Copy link
Member Author

evaleev commented Sep 27, 2023

currently um_expressions_suite/*cont* and librett_suite/librett_gpu_mem, e.g.

image

and

https://gitlab.com/ValeevGroup/tiledarray/-/jobs/5177464289#L1687

…, but in task body so that streams are per-task, not per thread in case a task recursively executes other tasks by doing Future::get(dowork=true)
@evaleev evaleev force-pushed the 420-stream-assignment-to-device-tasks-should-be-sticky branch from de1c368 to 2134056 Compare September 28, 2023 18:14
@evaleev evaleev force-pushed the 420-stream-assignment-to-device-tasks-should-be-sticky branch from 2134056 to dfa3f76 Compare September 28, 2023 18:25
@evaleev evaleev force-pushed the 420-stream-assignment-to-device-tasks-should-be-sticky branch from c612b71 to 4b79b5a Compare September 28, 2023 23:47
@evaleev evaleev merged commit 3ce7fdc into master Sep 29, 2023
8 checks passed
@evaleev evaleev deleted the 420-stream-assignment-to-device-tasks-should-be-sticky branch September 22, 2024 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support for multiple compute devices / MPI rank stream assignment to device tasks should be sticky
1 participant