Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements top_k functions in dpctl.tensor #1921

Open
wants to merge 33 commits into
base: master
Choose a base branch
from
Open

Conversation

ndgrigorian
Copy link
Collaborator

@ndgrigorian ndgrigorian commented Dec 4, 2024

This PR implements the functions top_k, top_k_indices, and top_k_values as per proposal in array API spec.

Radix and merge sorting are used, and modified merge-sort kernels are introduced which sort the array in chunks and write out to a temporary the k largest or smallest values.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?
  • If this PR is a work in progress, are you opening the PR as a draft?

Copy link

github-actions bot commented Dec 4, 2024

Copy link

github-actions bot commented Dec 4, 2024

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_296 ran successfully.
Passed: 895
Failed: 1
Skipped: 118

@coveralls
Copy link
Collaborator

coveralls commented Dec 4, 2024

Coverage Status

coverage: 87.716% (+0.06%) from 87.659%
when pulling 2865f27 on feature/topk
into 39a19c1 on master.

@oleksandr-pavlyk
Copy link
Collaborator

@ndgrigorian Please add top_k to the docs/doc_sources/api_reference/dpctl/tensor.sorting_functions.rst:

diff --git a/docs/doc_sources/api_reference/dpctl/tensor.sorting_functions.rst b/docs/doc_sources/api_reference/dpctl/tensor.sorting_functions.rst
index ae1605d988..ef20f4654c 100644
--- a/docs/doc_sources/api_reference/dpctl/tensor.sorting_functions.rst
+++ b/docs/doc_sources/api_reference/dpctl/tensor.sorting_functions.rst
@@ -10,3 +10,4 @@ Sorting functions

    argsort
    sort
+   top_k

Copy link

github-actions bot commented Dec 5, 2024

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_295 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

Copy link

github-actions bot commented Dec 6, 2024

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_297 ran successfully.
Passed: 895
Failed: 1
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_326 ran successfully.
Passed: 895
Failed: 1
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_327 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

@ndgrigorian ndgrigorian marked this pull request as ready for review December 12, 2024 09:24
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_331 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

@ndgrigorian ndgrigorian force-pushed the feature/topk branch 2 times, most recently from 8bcb100 to 8f38b80 Compare December 13, 2024 01:00
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_331 ran successfully.
Passed: 895
Failed: 1
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_331 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_388 ran successfully.
Passed: 893
Failed: 3
Skipped: 118

@ndgrigorian ndgrigorian force-pushed the feature/topk branch 3 times, most recently from 775b423 to 84d1388 Compare December 28, 2024 01:00
The implementation leverages existing merge-sort code, and partially sorts the array in cases where a parial sort reduces the size of temporary memory allocation
Reduces amount of casting. `k` will need to fit in `py::ssize_t` regardless.
Instead of using an overload to handle the `axis=None` case, use std::optional and check for trailing_dims_to_search in validation logic
Factored out map_back_impl projects indexing from flat index to a
row-wise index.

Removed dead code excluded by preprocessor conditional.
Replaced it with hand-written implementation of ceil_log2(n),
such that n <= (dectype(n){1} << ceil_log2(n)) is true for all
positive values of `n` in the range.
Add check of computed against expected indices
One asserts that at least one unique pointer is specified.
Another that specified arguments are unique pointers with
USMDeleter.
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_391 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_386 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_385 ran successfully.
Passed: 895
Failed: 1
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_386 ran successfully.
Passed: 895
Failed: 1
Skipped: 118

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_387 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

@ndgrigorian
Copy link
Collaborator Author

@oleksandr-pavlyk
I've added a skip for both the smallest and largest tests with int8

@oleksandr-pavlyk
Copy link
Collaborator

Good to see the CI green again!

I was suggesting to only skip it in the test command used in the workflow so that we can still provide a reproducer to the CPU team.

I was thinking we could add a file of tests to skip, and pass it as argument to pytest.

I think this is the approach taken by dpnp.

gid-lane_id is already a multiple of sg_size.
Change kernel to process few data elements in the work-item.
Counters can not exceed uint16_t max, because the kernel
assumes that the number of elements to sort fits into uint16_t.
The change reduces the kernel SLM footprint.

Also, remove use of std::move, uint16_t->std::uint16_t, etc

Replace size_t->std::size_t, uint32_t->std::uint32_t

Use `if constexpr` in order-preservign-cast for better readability.
The team developing OpenCL:CPU device runtime and compiler was notified.
See CMPLRLLVM-64592

Once fixed, the work-around should be removed.
was applied in C++.

Add tests for 2d input arrays, for axis=0 and axis=1

Add a test for non-contiguous input, 0d input, validation

100% coverage of top_k function implementation achieved
Copy link
Collaborator

@oleksandr-pavlyk oleksandr-pavlyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is ready to go in.

Thank you for working on this feature, @ndgrigorian

Any additional changes can be deferred to future PRs.

Copy link

github-actions bot commented Jan 4, 2025

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_399 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

@oleksandr-pavlyk
Copy link
Collaborator

I'd suggest rebasing the branch on top of the targeted base branch to remove two cherry-picked commits fixing the workflow for building with nightly DPC++ bundle

@oleksandr-pavlyk
Copy link
Collaborator

Also, to make GH rule checker happy, add a line to changelog, so that your commit is the last one, otherwise my approve does not count

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants