-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
print_corners #1187
print_corners #1187
Conversation
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1187/index.html |
For array of rank But in our case the Also, instead of decoding integer to sequence of bits using a string, perhaps consider using |
Array API standard conformance tests for dpctl=0.14.3dev1=py310h76be34b_14 ran successfully. |
Array API standard conformance tests for dpctl=0.14.3dev1=py310h76be34b_14 ran successfully. |
eaa63ee
to
e24fe4e
Compare
Array API standard conformance tests for dpctl=0.14.3dev1=py310h76be34b_14 ran successfully. |
2 similar comments
Array API standard conformance tests for dpctl=0.14.3dev1=py310h76be34b_14 ran successfully. |
Array API standard conformance tests for dpctl=0.14.3dev1=py310h76be34b_14 ran successfully. |
Array API standard conformance tests for dpctl=0.14.3dev1=py310h76be34b_14 ran successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before:
In [1]: import dpctl.tensor as dpt, dpctl
In [2]: m = dpt.ones((17, 15, 4, 31, 9, 4, 13), dtype="i2")
In [3]: from dpctl.tensor._print import _nd_corners
In [4]: %timeit -n 500 -r 12 dpt.asnumpy(_nd_corners(m, 3)).shape
14.2 ms ± 1.29 ms per loop (mean ± std. dev. of 12 runs, 500 loops each)
With changes from this PR:
In [1]: import dpctl.tensor as dpt, dpctl
In [2]: m = dpt.ones((17, 15, 4, 31, 9, 4, 13), dtype="i2")
In [3]: from dpctl.tensor._print import _nd_corners
In [4]: %timeit -n 500 -r 12 _nd_corners(m, 3).shape
4.72 ms ± 357 µs per loop (mean ± std. dev. of 12 runs, 500 loops each)
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
Array API standard conformance tests for dpctl=0.14.3dev1=py310h76be34b_14 ran successfully. |
In this PR, the recursive method used in
dpctl.tensor._print._nd_corners
function is replaced with an iterative method to improve performance.x_dpt = dpt.reshape(dpt.arange(6*6*117*117, dtype='i4'),(6,117,117,6))
%timeit -r 20 dpt.usm_ndarray_repr(x_dpt)
New timing: 4.55 ms ± 645 µs per loop (mean ± std. dev. of 20 runs, 100 loops each)
Old timing: 6.43 ms ± 2.31 ms per loop (mean ± std. dev. of 20 runs, 100 loops each)