-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Work-around for issue with CPU driver, add tests #1951
Conversation
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_388 ran successfully. |
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_389 ran successfully. |
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_390 ran successfully. |
@ndgrigorian For some reason |
Yes, and it works fine in the coverage CI as well. |
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_391 ran successfully. |
967aa6f
to
d135ac1
Compare
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_391 ran successfully. |
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_392 ran successfully. |
7d368b0
to
178d677
Compare
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_394 ran successfully. |
178d677
to
c30c40d
Compare
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_393 ran successfully. |
c30c40d
to
ab4fbe2
Compare
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_393 ran successfully. |
ab4fbe2
to
5f096b8
Compare
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_393 ran successfully. |
@ndgrigorian The CI is green now |
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_394 ran successfully. |
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_396 ran successfully. |
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_399 ran successfully. |
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_400 ran successfully. |
All lines of I think this PR is ready to be merged into the targeted base branch. Let me only rebase to squash some commits. (EDIT: Done) |
919e82e
to
7abdc56
Compare
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_396 ran successfully. |
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_398 ran successfully. |
…tly SYCL bundle DPC++ compiler
gid-lane_id is already a multiple of sg_size.
Change kernel to process few data elements in the work-item.
Counters can not exceed uint16_t max, because the kernel assumes that the number of elements to sort fits into uint16_t. The change reduces the kernel SLM footprint. Also, remove use of std::move, uint16_t->std::uint16_t, etc Replace size_t->std::size_t, uint32_t->std::uint32_t Use `if constexpr` in order-preservign-cast for better readability.
The team developing OpenCL:CPU device runtime and compiler was notified. See CMPLRLLVM-64592 Once fixed, the work-around should be removed.
was applied in C++. Add tests for 2d input arrays, for axis=0 and axis=1 Add a test for non-contiguous input, 0d input, validation 100% coverage of top_k function implementation achieved
b9dfea9
to
e1b7540
Compare
Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_395 ran successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests now passing, LGTM!
Due to a reported issue with OpenCL:CPU device implementation, the kernel for short inputs and short signed integral types may produce incorrect results when compiled for short SIMD width (as chosen automatically for AMD EPYC 7763 64 Processors CPU).
A work-around is to introduce a redundant barrier call (only for CPU devices when sub-group is short).
Also edits made while triaging the issue are applied: counters for short input arrays are stored in 16-bit unsigned integers, rather than in 32-bit ones,
uint16_t
replaced withstd::uint16_t
, etc.Tests for
tensor.top_k
were expanded to ensure 100% coverage.