-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dedicated code to copy array to C-contig/F-contig destinations #1850
Conversation
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_72 ran successfully. |
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_73 ran successfully. |
Examples: import dpctl.tensor as dpt
x = dpt.ones((3, 10, 10), order='F');
y = dpt.empty_like(x, order='C');
# now uses generic kernel to copy to contiguous destination
y[:] = x
x2 = dpt.moveaxis(dpt.ones((10, 10, 3), order='F'), 2, 0)
# Because x2 has shape (3, 10, 10), and strides (100, 1, 10)
# x2 is a batch of F-contig square matrices, and the following code uses
# faster kernel for copying
y2 = dpt.asarray(x2, order='C') Here is demonstration on laptop with Iris Xe integrated GPU:
On GPU Max the difference between timing in In[12]/In[13] (about the same as legacy timing before this PR) and In[4]/In[5] is more pronounced (25%), as well as difference between In[12]/In[13] and In[8]/In[9]. |
This is done more efficiently than generic copy-and-cast kernel. It is also done yet more efficiently for the batch of square matrices. Copy from (batch of views into C-contig matrices) to F-contig array of the same shape. src.shape = (n, n, ....) src.strides = (ld_src, 1, ...) Copy from (batch of views into F-contig matrices) to C-contig array of the same shape src.shape = (..., n, n) src.strides = (..., 1, ld_src)
005eaf7
to
f4705c0
Compare
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_74 ran successfully. |
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_75 ran successfully. |
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_76 ran successfully. |
779083c
to
610d88d
Compare
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_75 ran successfully. |
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_76 ran successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested the branch out, I haven't run into any issues, including after running the copy tests in libtensor
, no failures.
LGTM
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_77 ran successfully. |
All tests for |
This PR adds specialized kernels to copy
usm_ndarray
to C-/F-contiguous destinations of the same shape and the same dtype.It also adds dedicated kernels to copy batches of square matrices (which are views of F-contig matrices) to C-contiguous destinations, and batches of square matrices which are views of C-contig matrices to F-contiguous destinations. The intended usage is to speed-up conversion from C-contig batch of square matrices to F-contig batch of square matrices.
Tests are added.