-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try/catch wrap a call to deallocate in silent usm_host_allocator class to be used by std::vector for array metadata transfers #1791
Conversation
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
Array API standard conformance tests for dpctl=0.18.0dev0=py310ha798474_256 ran successfully. |
The log of test run with Testing with SYCL OS build internally, the crash is occurring in I will try to reproduce the issue though, as it keeps consistently reoccurring at the same test. |
Also remove --no-sycl-interface-test option, since DPCTLSyclInterface library is no longer so-versioned.
Wrote dpctl::tensor::offset_utils::usm_host_allocator<T> to allocate USM-host memory as storage to std::vector. Replaced uses of sycl::usm_memory<T, sycl::alloc::kind::host>. The new class derives from this, but overrides deallocate method to wrap call to base::deallocate in try/except. The exception, if caught, is printed but otherwise ignored, consistent like this is done on USMDeleter class used in dpctl.memory This is to work around sporadic crashes due to unhandled exception thrown by openCL::CPU driver, which appears to be benign. The issue was reported to CPU driver team, with native reproducer (compiler LLVM jira ticket 58387).
ccbd886
to
709b6bd
Compare
Array API standard conformance tests for dpctl=0.18.0dev0=py310ha798474_265 ran successfully. |
Array API standard conformance tests for dpctl=0.18.0dev0=py310ha798474_266 ran successfully. |
These were from previous year. Updated them to what DPC++ is using https://github.com/intel/llvm/blob/sycl/devops/dependencies.json#L27-L38 It might be nice to automate update these through some cron executed workflow.
Array API standard conformance tests for dpctl=0.18.0dev0=py310ha798474_276 ran successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nightly build passes now, so I think it would be good to get this in, looks good to me!
Introduced class
template <typename T> dpctl::tensor::offset_utils::usm_host_allocator
deriving fromsycl::usm_allocator<T, sycl::alloc::kind::host>
that wraps the call tobase::deallocate
in try/catch to prevent crashes due to seemingly benign exceptions thrown by call tosycl::free
by CPU device runtime which are under investigation.In case such an exception is caught, a message is printed to
std::cerr
, but the exception is otherwise ignored.Run pytest with
-s
for testing with nightly sycl bundle to be able to see such message printed.Also remove use of
--no-sycl-interface-test
option, since DPCTLSyclInterface library is no longer so-versioned.