-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need barrier after call to custom inclusive scan to avoid race condition #1624
Merged
ndgrigorian
merged 1 commit into
feature/tensor-accumulation
from
proposed-refinement-to-fix-for-tensor-accumulation-race-condition
Apr 1, 2024
Merged
Need barrier after call to custom inclusive scan to avoid race condition #1624
ndgrigorian
merged 1 commit into
feature/tensor-accumulation
from
proposed-refinement-to-fix-for-tensor-accumulation-race-condition
Apr 1, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
added comments explaining why barriers are needed
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_193 ran successfully. |
ndgrigorian
approved these changes
Apr 1, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
ndgrigorian
merged commit Apr 1, 2024
b1016bf
into
feature/tensor-accumulation
31 of 37 checks passed
ndgrigorian
deleted the
proposed-refinement-to-fix-for-tensor-accumulation-race-condition
branch
April 1, 2024 06:19
ndgrigorian
added a commit
that referenced
this pull request
Apr 1, 2024
* Use `shT` instead of `std::vector<py::ssize_t>` in `repeat` * Add missing host task to `host_tasks_list` in _reduction.py * Implements `dpt.cumulative_logsumexp`, `dpt.cumulative_prod`, and `dpt.cumulative_sum` The Python bindings for these functions are implemented in a new submodule `_tensor_accumulation_impl` * Adds the first tests for `dpt.cumulative_sum` * Pass host task vector to accumulator kernel calls This resolves hangs in unique functions * Implements `out` keyword for accumulators * Fixes cumulative_logsumexp when both an intermediate input and result temporary are needed * Only permute dims of allocated outputs if accumulated axis is not the trailing axis Fixes a bug where in some cases output axes were not being permuted * Enable scalar inputs to accumulation functions * Adds test for scalar inputs to cumulative_sum * Adds docstrings for cumulative_sum. cumulative_prod, and cumulative_logsumexp * Removed redundant dtype kind check in _default_accumulation_dtype * Reduce repetition of code allocation out array in _accumulate_common * Adds tests for accumulation function identities, `include_initial` keyword * Adds more tests for cumulative_sum * Correct typo in kernels/accumulators.hpp constexpr nwiT variables rather than nwiT constexpr variables * Increase work per work item in inclusive_scan_iter_1d update step * Removes a dead branch from _accumulate_common As `out` and the input would have to have the same data type to overlap, the second branch is never reached if `out` is the same array as the input * More accumulator tests * Removes dead branch from _accumulators.py A second out temporary does not need to be made in either branch when input and requested dtype are not implemented, as temporaries are always made Also removes part of a test intended to reach this branch * Adds tests for `cumulative_prod` and `cumulative_logsumexp` Also fixes incorrect TypeError in _accumulation.py * Widen acceptable results of test_logcumsumexp * Use np.logaddexp.accumulate in hopes of better numerical accuracy of expected result for cumulative_logsumexp * Attempt to improve cumulative_logsumexp testing by computing running logsumexp of test array * Reduce size of array in test_logcumsumexp_basic * Use const qualifiers to make compiler's job easier Indexers are made const, integral variables in kernels made const too Make two-offset instances const references to avoid copying. Gor rid of get_src_const_ptr unused methods in stack_t structs. Replaced auto with size_t as appropriate. Added const to make compiler analysis easier (and faster). * Add test for cumulative_logsumexp for geometric series summation, testing against closed form * Fix race condition in `custom_inclusive_scan_over_group` By returning data from `local_mem_acc` after the group barrier, if memory is later overwritten, a race condition follows, which was especially obvious on CPU Now the value is stored in variable before the barrier and then returned * Remove use of Numpy functions from test_tensor_accumulation and increase size of test_logcumsumexp_basic * Need barrier after call to custom inclusive scan to avoid race condition (#1624) added comments explaining why barriers are needed * Docstring edits Add empty new line after list item to make Sphinx happy. --------- Co-authored-by: Oleksandr Pavlyk <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Need barrier after call to custom inclusive scan to avoid race condition with subsequent writes into SLM.
Added comments explaining why barriers are needed