Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve queue compatibility testing #900

Merged
merged 8 commits into from
Sep 9, 2022

Conversation

oleksandr-pavlyk
Copy link
Collaborator

This introduces dpctl::utils::queues_are_compatible(exec_q, {alloc_q1, alloc_q2, ...}) and deploys it in tensor_py.cpp.

The check replaces notion of compatibility from queues have equal contexts to queues are equal, aligning check with the logic in _compute_follows_data.pyx.

Furthermore, this PR implements optimization for transfer of shape/strides from host-allocated metadata to kernels.
Previously, transfer was done with 3 or 4 calls to exec_q.copy. It is replaced with allocation of host temporary where necessary std::copy calls are made to populate it, followed by a single call to exec_q.copy to copy data into USM allocation for use in kernel.

No other logic has changed.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?

Usage queues_are_compatible(exec_q, {alloc_q1, alloc_q2, ...}).
Returns true if compatible, false otherwise.
Insteads of invoking 4 copy kernels, it is more expedient to pack them
on the host and use single copy kernel to reduce kernel submission
overhead.wq
Applied optimization of replacing 3 queue.copy calls to copy shape,
src_strides, dst_strides to copy host meta-data into USM allocation
for use in copy_and_cast kernel with creating packed vector on the host
and using a single queue.copy call of the packed host vector to USM
allocation.w
@github-actions
Copy link

github-actions bot commented Sep 7, 2022

@coveralls
Copy link
Collaborator

coveralls commented Sep 7, 2022

Coverage Status

Coverage remained the same at 81.872% when pulling 74f0b37 on improve-queue-compatibility-testing into 09b2555 on master.

Copy link
Contributor

@diptorupd diptorupd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine in general, but the copy_usm_ndarray_into_usm_ndarray function has become too big. I feel a refactor is required for readability.

… for copy-and-cast operation between two usm_ndarrays
@oleksandr-pavlyk oleksandr-pavlyk force-pushed the improve-queue-compatibility-testing branch from ffb42f9 to 74f0b37 Compare September 8, 2022 15:52
@oleksandr-pavlyk
Copy link
Collaborator Author

Merging.

@oleksandr-pavlyk oleksandr-pavlyk merged commit 3d79ef9 into master Sep 9, 2022
@oleksandr-pavlyk oleksandr-pavlyk deleted the improve-queue-compatibility-testing branch September 9, 2022 23:18
@github-actions
Copy link

github-actions bot commented Sep 9, 2022

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

@github-actions
Copy link

github-actions bot commented Sep 9, 2022

Array API standard conformance tests failed to run for dpctl=0.14.0dev0=py310h8c27c75_78.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants