-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Vectorize np.partition and np.argpartition using AVX-512 #24201
Conversation
Looks like all the CI failures are related to 2 tests:
where both the tests expect the output of |
I guess that problem solved itself when we vectorize both partition and argpartition :) |
Benchmarks for
|
inline bool quickselect_dispatch(T* v, npy_intp num, npy_intp kth) | ||
{ | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows will never get the faster version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quicksort patch ran into some problem with WIN32 builds and I never quite figured it out. Let me try running the partition patch through the CI, may be this one has better luck on windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this one had no trouble. Windows will get the faster version too :) Might be worth checking if quicksort can be enabled, but will do it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to work. Should we circle back to enabling AVX512 quicksort for windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry, separate PR would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AVX512 quicksort, argsort, partition and argpartition are all enabled on windows (64-bit only though).
LGTM. I did not review the subreo. @seiko2plus any thoughts? |
Rebasing with main. |
Something went wrong with the Travis CI, I think unrelated to this patch. |
Yes, unrelated. That failure happens now and then, the give away is "and 1 action required checks". |
bah, that Travis CI failed again. |
0029799
to
1cd3d8b
Compare
friendly ping :) |
89f58fc
to
eefc2b6
Compare
29cd42b
to
97e6714
Compare
Phew! Ready for review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great performance improvement, Thank you! Just need to fix the float16 dispatching. Although our current tests should catch it, unfortunately, none of our CI workers support AVX512/ICL at least, also intel SDE still broken.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
kth = np.float16(0.1558), k = np.int64(29)
arr_part = array([ 0.001953, 0.007324, 0.01172 , 0.012695, 0.0249 , 0.0381 ,
0.03857 , 0.07666 , 0.07764 , 0.07... , -0.2622 , -0.2915 , -0.3003 ,
-0.3047 , -0.3323 , -0.3682 , -0.4297 , -0.4526 ],
dtype=float16)
def assert_arr_partitioned(kth, k, arr_part):
> assert_equal(arr_part[k], kth)
E AssertionError:
E Items are not equal:
E ACTUAL: np.float16(-0.2046)
E DESIRED: np.float16(0.1558)
arr_part = array([ 0.001953, 0.007324, 0.01172 , 0.012695, 0.0249 , 0.0381 ,
0.03857 , 0.07666 , 0.07764 , 0.07... , -0.2622 , -0.2915 , -0.3003 ,
-0.3047 , -0.3323 , -0.3682 , -0.4297 , -0.4526 ],
dtype=float16)
k = np.int64(29)
kth = np.float16(0.1558)
numpy/core/tests/test_multiarray.py:51: AssertionError
================ short test summary info ================
FAILED numpy/core/tests/test_multiarray.py::test_partition_fp[float16-N0]
FAILED numpy/core/tests/test_multiarray.py::test_partition_fp[float16-N1]
....
1087f45
to
6cd06c0
Compare
@seiko2plus Fixed the bug and passes all the tests on a Tigerlake. We should have an SDE version with the bug fix out soon and we should run CI on TGL and SPR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thank you!
Thanks @r-devulap |
Provides a speed up of 25x for 16-bit, 17x for 32-bit dtypes and about 8x speed up for 64-bit dtypes. Benchmark numbers on an array of size 100000 for varying values of k (10, 100 and 1000).
Benchmark for random data on Intel TGL:
Benchmark numbers on other kinds of array can be seen here.