-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No need to call _copy_overlapping if src and dst address same memory #1284
Conversation
Shouldn't it also fix sqrt with 'out' for pairwise distance? |
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1284/index.html |
``` In [1]: import dpctl.tensor as dpt, dpctl, dpctl.utils In [2]: n, m = 8 * 540, 8 * 960 In [3]: a = dpt.ones((m, n)) In [4]: b = dpt.zeros((m, n)) In [5]: b_s = dpt.zeros((m, n+2)) In [6]: with dpctl.utils.onetrace_enabled(): ...: b_s[:,:-2] += a ...: Device Timeline (queue: 0x556080b9cea0): zeCommandListAppendMemoryCopy(H2D)[48 bytes]<4.1> [ns] = 16946404661 (append) 16952292497 (submit) 16952613747 (start) 16952623538 (end) Device Timeline (queue: 0x556080b9cea0): dpctl::tensor::kernels::add::add_inplace_strided_kernel<float, float, dpctl::tensor::offset_utils::TwoOffsets_StridedIndexer>[SIMD32 {64800; 1; 1} {512; 1; 1}]<5.1> [ns] = 17017855801 (append) 17018342202 (submit) 17019138920 (start) 17030770482 (end) ``` Earlier, two more copy operations were being performed as well.
17a2623
to
701c05b
Compare
Array API standard conformance tests for dpctl=0.14.5dev1=py310h7bf5fec_11 ran successfully. |
1 similar comment
Array API standard conformance tests for dpctl=0.14.5dev1=py310h7bf5fec_11 ran successfully. |
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
Array API standard conformance tests for dpctl=0.14.5dev1=py310h7bf5fec_11 ran successfully. |
Earlier, two more copy operations were being performed as well.
Previously:
Now: