State of interoperability with cuda backend #947

fcharras · 2022-10-19T10:14:34Z

dpctl can detect cuda devices:

In [2]: import dpctl

In [3]: dpctl.get_devices()
Out[3]: [<dpctl.SyclDevice [backend_type.cuda, device_type.gpu,  NVIDIA GeForce GTX 1070] at 0x7fa92ca4a270>]

but dpctl.program can't create kernels, only level_zero and opencl are supported:

Are there plans for further interoperability from dpctl with backends that are not opencl and level_zero ?

The text was updated successfully, but these errors were encountered:

oleksandr-pavlyk · 2022-10-19T15:39:35Z

Yes, this is documented limitation of dpctl.program. Open-source SYCL (compiled from sources from intel/llvm with CUDA support configured) allows to create interoperability SYCL kernel-bundles from byte buffer filled with PTX byte-code.

Constructing cuda object underlying the kernel-bundle requires cuda, and so should be done outside of dpctl.

Once constructed, the interoperability kernel bundle can be represented by dpctl.program.SyclProgram.

diptorupd · 2022-10-19T19:11:03Z

@fcharras what SYCL runtime are you using? Did you build the open source Intel LLVM SYCL tool chain to try it with dpctl?

If there is an interest, the dpctl_sycl_kernel_bundle_interface can be extended to support CUDA as well. For us, the quandary has always been testing and maintaining such an interface will also mean having to build and maintain a dpcpp runtime with CUDA support.

fcharras · 2022-10-20T06:44:46Z

Did you build the open source Intel LLVM SYCL tool chain to try it with dpctl

Yes, I was investigating the potential of interoperability of dpctl / dpnp / numba_dpex. Having the same python code be distributed and run in a hardware agnostic way would be such a good feature. Also from a developer perspective it's important to know early what has the potential to be portable and what hasn't when choosing the libraries to build on. Not only to cuda but also hip/amd so we can target the largest scope of users. If we have working POCs we would also consider CI setups for those backends.

From what I've gathered so far, dpctl and numba_dpex interoperability is not out of reach and depends on extension on dpctl_sycl_kernel_bundle_interface. I could contribute myself but at the moment I might lack the required knowledge. On the other hand I believe kernels in dpnp have less potential of running on other devices ? That's would be good to know since e.g. we have started to use dpnp primitives (e.g dpnp.partition) as shortcuts.

diptorupd · 2022-10-20T14:34:26Z

@fcharras Thanks for the clarification.

The type of interoperability you have in mind has been on my TODO list for a while. Here is the basic outline of how I think the portability/interoperability will work from the perspective of numba_dpex and dpctl.

Extend the kernel decorator for dpex to generate PTX like mentioned in Using the nvidia opencl runtime intel/llvm#7114
- Having this support is not too hard as we already have numba.cuda and we will need some front-end overloads to make sure dpex.get_global_id gets translated to numba.cuda.threadid etc.. And after that we can use the numba.cuda pipeline to generate the PTX.
Once we can generate PTX for a numba_dpex.kernel, we will have to compile it to a SYCL kernelBundle. The work will involve extending the dpctl_sycl_kernel_bundle_interface to support CUDA.
After that, all that remains is to provide a package for a dpcpp runtime that can support CUDA. The present dpcpp_cpp_rt package that Intel distributes only supports opencl and level_zero. We will need an alternate runtime package using the opensource dpcpp tool chain to be built using CUDA support.

The bulk of the engineering time IMO will go into the first task of extending the kernel decorator, but it is not intractable. I recently have started to refactor and clean up the dpex kernel internals (IntelPython/numba-dpex#804), one of the unstated goals is in fact to make the interface modular enough to support CUDA or any other type of non-SPIRV kernel.

The third bullet about packaging and distributing a dpcpp runtime with CUDA support is the next challenge. That needs to be solved as a community effort.

diptorupd · 2022-10-20T14:40:27Z

On the other hand I believe kernels in dpnp have less potential of running on other devices ? That's would be good to know since e.g. we have started to use dpnp primitives (e.g dpnp.partition) as shortcuts.

I will let @oleksandr-pavlyk confirm, but I do think even dpnp can be extended to CUDA. dpnp kernels are either pure SYCL implementations and can be compiled for CUDA using the opensource dpcpp, or the kernels use oneMKL. oneMKL too has CUDA support.

I and Sasha at one point did experiment using the CUDA backend for oneMKL via the dpctl pybind11 interface. We had mixed luck. The real challenge was for some reason at that time oneMKL could not support both CUDA and LevelZero at the same time. You needed to compile the code for one or the other. Things might have changed and improved since then. We have the example code for the oneMKL dpctl interface here: https://github.com/IntelPython/dpctl/tree/master/examples/pybind11/onemkl_gemv

fcharras · 2022-10-21T06:43:59Z

I opened a separate issue for building dpnp IntelPython/dpnp#1206

About it using onemkl I'm not sure ? I think I did successfully build onemkl with gpu support but the resulting libraries are named libonemkl* while the mkl libraries required by dpnp are those found e.g in the oneapi basekit and named libmkl* and in particular libmkl_sycl.so is required to pass the MATHLIB_FOUND check.

And regarding support for building with custom dpcpp I could not get past the RPATH error that is described in the issue.

diptorupd · 2022-10-24T19:57:59Z

@fcharras you are right dpnp is not using the oneMKL interfaces, but MKL libraries directly.

@oleksandr-pavlyk We should explore what it will take to switch from MKL to oneMKL as that will open up CUDA support.

diptorupd · 2023-03-14T23:08:03Z

@fcharras I did some initial exploration using the oneAPI plugin for NVIDIA GPU. I feel supporting CUDA now that there is a dedicated CUDA plugin available is much simpler at least in dpctl.

I have started a discussion in #1124, let us discuss further there.

fcharras · 2023-03-15T09:06:51Z

Awesome news !

oleksandr-pavlyk · 2024-12-16T13:39:59Z

DPCTL can now be compiled targeting one of, or all of, CUDA and HIP SYCL targets in addition to SPIR64 SYCL target.

This is documented in https://intelpython.github.io/dpctl/latest/beginners_guides/installation.html#building-for-custom-sycl-targets

fcharras mentioned this issue Oct 20, 2022

Using the nvidia opencl runtime intel/llvm#7114

Closed

diptorupd added the user label Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

State of interoperability with cuda backend #947

State of interoperability with cuda backend #947

fcharras commented Oct 19, 2022

oleksandr-pavlyk commented Oct 19, 2022 •

edited

Loading

diptorupd commented Oct 19, 2022

fcharras commented Oct 20, 2022

diptorupd commented Oct 20, 2022

diptorupd commented Oct 20, 2022 •

edited

Loading

fcharras commented Oct 21, 2022 •

edited

Loading

diptorupd commented Oct 24, 2022

diptorupd commented Mar 14, 2023

fcharras commented Mar 15, 2023

oleksandr-pavlyk commented Dec 16, 2024

State of interoperability with cuda backend #947

State of interoperability with cuda backend #947

Comments

fcharras commented Oct 19, 2022

oleksandr-pavlyk commented Oct 19, 2022 • edited Loading

diptorupd commented Oct 19, 2022

fcharras commented Oct 20, 2022

diptorupd commented Oct 20, 2022

diptorupd commented Oct 20, 2022 • edited Loading

fcharras commented Oct 21, 2022 • edited Loading

diptorupd commented Oct 24, 2022

diptorupd commented Mar 14, 2023

fcharras commented Mar 15, 2023

oleksandr-pavlyk commented Dec 16, 2024

oleksandr-pavlyk commented Oct 19, 2022 •

edited

Loading

diptorupd commented Oct 20, 2022 •

edited

Loading

fcharras commented Oct 21, 2022 •

edited

Loading