Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No need to call _copy_overlapping if src and dst address same memory #1284

Merged
merged 1 commit into from
Jul 17, 2023

Conversation

oleksandr-pavlyk
Copy link
Collaborator

In [1]: import dpctl.tensor as dpt, dpctl, dpctl.utils

In [2]: n, m = 8 * 540, 8 * 960

In [3]: a = dpt.ones((m, n))

In [4]: b = dpt.zeros((m, n))

In [5]: b_s = dpt.zeros((m, n+2))

In [6]: with dpctl.utils.onetrace_enabled():
   ...:     b_s[:,:-2] += a
      ...:
      Device Timeline (queue: 0x556080b9cea0): zeCommandListAppendMemoryCopy(H2D)[48 bytes]<4.1> [ns] = 16946404661 (append) 16952292497 (submit) 16952613747 (start) 16952623538 (end)
      Device Timeline (queue: 0x556080b9cea0): dpctl::tensor::kernels::add::add_inplace_strided_kernel<float, float, dpctl::tensor::offset_utils::TwoOffsets_StridedIndexer>[SIMD32 {64800; 1; 1} {512; 1; 1}]<5.1> [ns] = 17017855801 (append) 17018342202 (submit) 17019138920 (start) 17030770482 (end)

Earlier, two more copy operations were being performed as well.

Previously:

In [7]: %time b_s[:,:-2] += a
CPU times: user 13.2 ms, sys: 24.7 ms, total: 37.9 ms
Wall time: 53 ms

Now:

In [7]: %time b_s[:,:-2] += a
CPU times: user 5.08 ms, sys: 9.58 ms, total: 14.7 ms
Wall time: 16.7 ms
  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you opening the PR as a draft?

@AlexanderKalistratov
Copy link

Shouldn't it also fix sqrt with 'out' for pairwise distance?

@github-actions
Copy link

```
In [1]: import dpctl.tensor as dpt, dpctl, dpctl.utils

In [2]: n, m = 8 * 540, 8 * 960

In [3]: a = dpt.ones((m, n))

In [4]: b = dpt.zeros((m, n))

In [5]: b_s = dpt.zeros((m, n+2))

In [6]: with dpctl.utils.onetrace_enabled():
   ...:     b_s[:,:-2] += a
      ...:
      Device Timeline (queue: 0x556080b9cea0): zeCommandListAppendMemoryCopy(H2D)[48 bytes]<4.1> [ns] = 16946404661 (append) 16952292497 (submit) 16952613747 (start) 16952623538 (end)
      Device Timeline (queue: 0x556080b9cea0): dpctl::tensor::kernels::add::add_inplace_strided_kernel<float, float, dpctl::tensor::offset_utils::TwoOffsets_StridedIndexer>[SIMD32 {64800; 1; 1} {512; 1; 1}]<5.1> [ns] = 17017855801 (append) 17018342202 (submit) 17019138920 (start) 17030770482 (end)
```

Earlier, two more copy operations were being performed as well.
@oleksandr-pavlyk oleksandr-pavlyk force-pushed the improve-overlap-check-in-copy branch from 17a2623 to 701c05b Compare July 17, 2023 14:24
@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.5dev1=py310h7bf5fec_11 ran successfully.
Passed: 448
Failed: 552
Skipped: 119

1 similar comment
@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.5dev1=py310h7bf5fec_11 ran successfully.
Passed: 448
Failed: 552
Skipped: 119

@oleksandr-pavlyk oleksandr-pavlyk merged commit a6d16f2 into master Jul 17, 2023
@oleksandr-pavlyk oleksandr-pavlyk deleted the improve-overlap-check-in-copy branch July 17, 2023 17:43
@github-actions
Copy link

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.5dev1=py310h7bf5fec_11 ran successfully.
Passed: 448
Failed: 552
Skipped: 119

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants