-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Created a temporary copy in case of overlap for unary function #1281
Conversation
Changes unknown when pulling 03a46e1 on unary_out_overlap into ** on master**. |
Array API standard conformance tests for dpctl=0.14.5dev1=py310h7bf5fec_7 ran successfully. |
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1281/index.html |
Array API standard conformance tests for dpctl=0.14.5dev1=py310h7bf5fec_12 ran successfully. |
test_square.py::test_sqrt_out_overlap -> test_square.py::test_square_out_overlap
07015f1
to
fde917a
Compare
The call operator of this struct verifies whether two USM ND-arrays logically address the same memory elements. In the case when data-parallel read from and write to arrays that locally address the same memory elements there is no race condition and no additional copying is needed.
The predicate determines is argument arrays are the same (same dimension, shape, data type, pointer, strides). Used to determine if copying must be performed in case of overlap to avoid race condition.
Of out array is logically the same as input array, there is no race condition, so avoid performing the temporary copy.
fde917a
to
09cd171
Compare
Array API standard conformance tests for dpctl=0.14.5dev1=py310h7bf5fec_15 ran successfully. |
Array API standard conformance tests for dpctl=0.14.5dev1=py310h7bf5fec_15 ran successfully. |
Array API standard conformance tests for dpctl=0.14.5dev1=py310h7bf5fec_18 ran successfully. |
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
Array API standard conformance tests for dpctl=0.14.5dev1=py310h7bf5fec_18 ran successfully. |
The PR proposes to allocate a temporary buffer rather than to raise an exception in case when the memory overlapping is detected between
in
andout
arrays in a call of unary function.The changed is intended to have the below code example as a valid:
The approach with temporary buffer was chosen as a start point while integrating support of the use case, since it is easiest way to implement this.
The next step might be to add separate kernels to handle in-place unary operations (when both
in
andout
arrays point to the same memory) if the performance gain would be sensible.