Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copying from numpy into usm_ndarray is unnecessarily slow #723

Closed
oleksandr-pavlyk opened this issue Dec 13, 2021 · 0 comments
Closed

Copying from numpy into usm_ndarray is unnecessarily slow #723

oleksandr-pavlyk opened this issue Dec 13, 2021 · 0 comments
Assignees

Comments

@oleksandr-pavlyk
Copy link
Collaborator

Using the enclosed script time_copy.py it is clear that dpctl.tensor.usm_ndarray.__setitem__ is not efficient when copying C-contiguous host buffer into C-contiguous USM array:

(idp_2021.4) [13:25:40 ansatnuc04 python]$ python time_copy.py
Wall time:  0.00044969748705625534  sec.
Device time:  0.00010292000000000001  sec.
Wall time:  4.959066528826952  sec.
Device time:  0.717467438  sec.

This is likely because copying is done an element per kernel, and contiguity is not taken advantage of.

(idp_2021.4) [13:27:17 ansatnuc04 python]$ python -c "import dpctl; print(dpctl.__version__)"
0.12.0dev1+91.gb7a15ed9
time_copy.py script
# time_copy.py
import numpy as np

import dpctl
import dpctl.tensor as dpt
import dpctl.memory as dpm

n = 8 * 1024
host_array = np.random.random(size=n)

q = dpctl.SyclQueue("gpu", property="enable_profiling")

timer0 = dpctl.SyclTimer(time_scale=1) # report duration in seconds
with timer0(q):
    # copying using queue
    usm_array = dpt.empty(host_array.shape,
                          dtype=host_array.dtype,
                          sycl_queue=q)
    usm_array.usm_data.copy_from_host(host_array.reshape((-1)).view("u1"))

host_time, device_time = timer0.dt

print("Wall time: ", host_time, " sec.")
print("Device time: ", device_time, " sec.")

timer1 = dpctl.SyclTimer(time_scale=1) # report duration in seconds
with timer1(q):
    # copying using queue
    usm_array = dpt.asarray(host_array, sycl_queue=q)

host_time, device_time = timer1.dt

print("Wall time: ", host_time, " sec.")
print("Device time: ", device_time, " sec.")
@oleksandr-pavlyk oleksandr-pavlyk self-assigned this Dec 13, 2021
oleksandr-pavlyk added a commit that referenced this issue Dec 13, 2021
Restores parity in performance of two scenarios in time_copy.py script

```
(idp_2021.4) [13:33:21 ansatnuc04 python]$ python time_copy.py
Wall time:  0.0004440806806087494  sec.
Device time:  9.926800000000001e-05  sec.
Wall time:  0.0006928546354174614  sec.
Device time:  0.000150562  sec.
```
oleksandr-pavlyk added a commit that referenced this issue Dec 14, 2021
Fixes #723: slow item assignment from numpy array
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant