Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try/catch wrap a call to deallocate in silent usm_host_allocator class to be used by std::vector for array metadata transfers #1791

Merged

Conversation

oleksandr-pavlyk
Copy link
Collaborator

@oleksandr-pavlyk oleksandr-pavlyk commented Aug 7, 2024

Introduced class template <typename T> dpctl::tensor::offset_utils::usm_host_allocator deriving from sycl::usm_allocator<T, sycl::alloc::kind::host> that wraps the call to base::deallocate in try/catch to prevent crashes due to seemingly benign exceptions thrown by call to sycl::free by CPU device runtime which are under investigation.

In case such an exception is caught, a message is printed to std::cerr, but the exception is otherwise ignored.

Run pytest with -s for testing with nightly sycl bundle to be able to see such message printed.

Also remove use of --no-sycl-interface-test option, since DPCTLSyclInterface library is no longer so-versioned.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you opening the PR as a draft?

Copy link

github-actions bot commented Aug 7, 2024

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

@oleksandr-pavlyk oleksandr-pavlyk changed the title Run pytest with -s for testign with nightly sycl bundle Run pytest with -s for testing with nightly sycl bundle Aug 7, 2024
@coveralls
Copy link
Collaborator

coveralls commented Aug 7, 2024

Coverage Status

coverage: 87.93%. remained the same
when pulling a49bd76 on run-os-llvm-dpctl-tests-without-stdout-stderr-capture
into ff0d4ea on master.

Copy link

github-actions bot commented Aug 7, 2024

Array API standard conformance tests for dpctl=0.18.0dev0=py310ha798474_256 ran successfully.
Passed: 894
Failed: 1
Skipped: 119

@oleksandr-pavlyk
Copy link
Collaborator Author

The log of test run with -s option does not show message about exception caught during call to sycl::free during execution of the crashing test.

Testing with SYCL OS build internally, the crash is occurring in test_matmul_nilpotent1[i1] in unified runtime library. This one must be different, specific to the architecture of the runner's CPU. It may, perhaps, be masking the problem with
test_matmul_nilpotent1[i1].

I will try to reproduce the issue though, as it keeps consistently reoccurring at the same test.

Also remove --no-sycl-interface-test option, since DPCTLSyclInterface
library is no longer so-versioned.
Wrote dpctl::tensor::offset_utils::usm_host_allocator<T> to allocate
USM-host memory as storage to std::vector.

Replaced uses of sycl::usm_memory<T, sycl::alloc::kind::host>. The
new class derives from this, but overrides deallocate method to
wrap call to base::deallocate in try/except. The exception, if
caught, is printed but otherwise ignored, consistent like this is
done on USMDeleter class used in dpctl.memory

This is to work around sporadic crashes due to unhandled exception
thrown by openCL::CPU driver, which appears to be benign.

The issue was reported to CPU driver team, with native reproducer
(compiler LLVM jira ticket 58387).
@oleksandr-pavlyk oleksandr-pavlyk force-pushed the run-os-llvm-dpctl-tests-without-stdout-stderr-capture branch from ccbd886 to 709b6bd Compare August 8, 2024 13:25
Copy link

github-actions bot commented Aug 8, 2024

Array API standard conformance tests for dpctl=0.18.0dev0=py310ha798474_265 ran successfully.
Passed: 894
Failed: 1
Skipped: 119

Copy link

github-actions bot commented Aug 8, 2024

Array API standard conformance tests for dpctl=0.18.0dev0=py310ha798474_266 ran successfully.
Passed: 894
Failed: 1
Skipped: 119

@oleksandr-pavlyk oleksandr-pavlyk changed the title Run pytest with -s for testing with nightly sycl bundle Try/catch wrap a call to deallocate in silent usm_host_allocator class to be used by std::vector for array metadata transfers Aug 8, 2024
These were from previous year. Updated them to what DPC++ is using

https://github.com/intel/llvm/blob/sycl/devops/dependencies.json#L27-L38

It might be nice to automate update these through some cron executed
workflow.
Copy link

github-actions bot commented Aug 9, 2024

Array API standard conformance tests for dpctl=0.18.0dev0=py310ha798474_276 ran successfully.
Passed: 894
Failed: 1
Skipped: 119

Copy link
Collaborator

@ndgrigorian ndgrigorian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nightly build passes now, so I think it would be good to get this in, looks good to me!

@oleksandr-pavlyk oleksandr-pavlyk merged commit 5f3b001 into master Aug 9, 2024
22 of 30 checks passed
@oleksandr-pavlyk oleksandr-pavlyk deleted the run-os-llvm-dpctl-tests-without-stdout-stderr-capture branch August 9, 2024 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants