You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using the enclosed script time_copy.py it is clear that dpctl.tensor.usm_ndarray.__setitem__ is not efficient when copying C-contiguous host buffer into C-contiguous USM array:
(idp_2021.4) [13:25:40 ansatnuc04 python]$ python time_copy.py
Wall time: 0.00044969748705625534 sec.
Device time: 0.00010292000000000001 sec.
Wall time: 4.959066528826952 sec.
Device time: 0.717467438 sec.
This is likely because copying is done an element per kernel, and contiguity is not taken advantage of.
Restores parity in performance of two scenarios in time_copy.py script
```
(idp_2021.4) [13:33:21 ansatnuc04 python]$ python time_copy.py
Wall time: 0.0004440806806087494 sec.
Device time: 9.926800000000001e-05 sec.
Wall time: 0.0006928546354174614 sec.
Device time: 0.000150562 sec.
```
Using the enclosed script
time_copy.py
it is clear thatdpctl.tensor.usm_ndarray.__setitem__
is not efficient when copying C-contiguous host buffer into C-contiguous USM array:This is likely because copying is done an element per kernel, and contiguity is not taken advantage of.
time_copy.py script
The text was updated successfully, but these errors were encountered: