Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State of interoperability with cuda backend #947

Open
fcharras opened this issue Oct 19, 2022 · 10 comments
Open

State of interoperability with cuda backend #947

fcharras opened this issue Oct 19, 2022 · 10 comments
Labels

Comments

@fcharras
Copy link
Contributor

dpctl can detect cuda devices:

In [2]: import dpctl

In [3]: dpctl.get_devices()
Out[3]: [<dpctl.SyclDevice [backend_type.cuda, device_type.gpu,  NVIDIA GeForce GTX 1070] at 0x7fa92ca4a270>]

but dpctl.program can't create kernels, only level_zero and opencl are supported:

Are there plans for further interoperability from dpctl with backends that are not opencl and level_zero ?

@oleksandr-pavlyk
Copy link
Collaborator

oleksandr-pavlyk commented Oct 19, 2022

Yes, this is documented limitation of dpctl.program. Open-source SYCL (compiled from sources from intel/llvm with CUDA support configured) allows to create interoperability SYCL kernel-bundles from byte buffer filled with PTX byte-code.

Constructing cuda object underlying the kernel-bundle requires cuda, and so should be done outside of dpctl.

Once constructed, the interoperability kernel bundle can be represented by dpctl.program.SyclProgram.

@diptorupd
Copy link
Contributor

@fcharras what SYCL runtime are you using? Did you build the open source Intel LLVM SYCL tool chain to try it with dpctl?

If there is an interest, the dpctl_sycl_kernel_bundle_interface can be extended to support CUDA as well. For us, the quandary has always been testing and maintaining such an interface will also mean having to build and maintain a dpcpp runtime with CUDA support.

@fcharras
Copy link
Contributor Author

Did you build the open source Intel LLVM SYCL tool chain to try it with dpctl

Yes, I was investigating the potential of interoperability of dpctl / dpnp / numba_dpex. Having the same python code be distributed and run in a hardware agnostic way would be such a good feature. Also from a developer perspective it's important to know early what has the potential to be portable and what hasn't when choosing the libraries to build on. Not only to cuda but also hip/amd so we can target the largest scope of users. If we have working POCs we would also consider CI setups for those backends.

From what I've gathered so far, dpctl and numba_dpex interoperability is not out of reach and depends on extension on dpctl_sycl_kernel_bundle_interface. I could contribute myself but at the moment I might lack the required knowledge. On the other hand I believe kernels in dpnp have less potential of running on other devices ? That's would be good to know since e.g. we have started to use dpnp primitives (e.g dpnp.partition) as shortcuts.

@diptorupd
Copy link
Contributor

@fcharras Thanks for the clarification.

The type of interoperability you have in mind has been on my TODO list for a while. Here is the basic outline of how I think the portability/interoperability will work from the perspective of numba_dpex and dpctl.

  • Extend the kernel decorator for dpex to generate PTX like mentioned in Using the nvidia opencl runtime intel/llvm#7114
    • Having this support is not too hard as we already have numba.cuda and we will need some front-end overloads to make sure dpex.get_global_id gets translated to numba.cuda.threadid etc.. And after that we can use the numba.cuda pipeline to generate the PTX.
  • Once we can generate PTX for a numba_dpex.kernel, we will have to compile it to a SYCL kernelBundle. The work will involve extending the dpctl_sycl_kernel_bundle_interface to support CUDA.
  • After that, all that remains is to provide a package for a dpcpp runtime that can support CUDA. The present dpcpp_cpp_rt package that Intel distributes only supports opencl and level_zero. We will need an alternate runtime package using the opensource dpcpp tool chain to be built using CUDA support.

The bulk of the engineering time IMO will go into the first task of extending the kernel decorator, but it is not intractable. I recently have started to refactor and clean up the dpex kernel internals (IntelPython/numba-dpex#804), one of the unstated goals is in fact to make the interface modular enough to support CUDA or any other type of non-SPIRV kernel.

The third bullet about packaging and distributing a dpcpp runtime with CUDA support is the next challenge. That needs to be solved as a community effort.

@diptorupd
Copy link
Contributor

diptorupd commented Oct 20, 2022

On the other hand I believe kernels in dpnp have less potential of running on other devices ? That's would be good to know since e.g. we have started to use dpnp primitives (e.g dpnp.partition) as shortcuts.

I will let @oleksandr-pavlyk confirm, but I do think even dpnp can be extended to CUDA. dpnp kernels are either pure SYCL implementations and can be compiled for CUDA using the opensource dpcpp, or the kernels use oneMKL. oneMKL too has CUDA support.

I and Sasha at one point did experiment using the CUDA backend for oneMKL via the dpctl pybind11 interface. We had mixed luck. The real challenge was for some reason at that time oneMKL could not support both CUDA and LevelZero at the same time. You needed to compile the code for one or the other. Things might have changed and improved since then. We have the example code for the oneMKL dpctl interface here: https://github.com/IntelPython/dpctl/tree/master/examples/pybind11/onemkl_gemv

@fcharras
Copy link
Contributor Author

fcharras commented Oct 21, 2022

I opened a separate issue for building dpnp IntelPython/dpnp#1206

About it using onemkl I'm not sure ? I think I did successfully build onemkl with gpu support but the resulting libraries are named libonemkl* while the mkl libraries required by dpnp are those found e.g in the oneapi basekit and named libmkl* and in particular libmkl_sycl.so is required to pass the MATHLIB_FOUND check.

And regarding support for building with custom dpcpp I could not get past the RPATH error that is described in the issue.

@diptorupd
Copy link
Contributor

@fcharras you are right dpnp is not using the oneMKL interfaces, but MKL libraries directly.

@oleksandr-pavlyk We should explore what it will take to switch from MKL to oneMKL as that will open up CUDA support.

@diptorupd
Copy link
Contributor

@fcharras I did some initial exploration using the oneAPI plugin for NVIDIA GPU. I feel supporting CUDA now that there is a dedicated CUDA plugin available is much simpler at least in dpctl.

I have started a discussion in #1124, let us discuss further there.

@fcharras
Copy link
Contributor Author

Awesome news !

@diptorupd diptorupd added the user label Mar 1, 2024
@oleksandr-pavlyk
Copy link
Collaborator

DPCTL can now be compiled targeting one of, or all of, CUDA and HIP SYCL targets in addition to SPIR64 SYCL target.

This is documented in https://intelpython.github.io/dpctl/latest/beginners_guides/installation.html#building-for-custom-sycl-targets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants