-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eager jitting does not work with numba_dpex
#816
Comments
@fcharras I am addressing the issue in my ongoing refactoring of the dpex kernel decorator internals. In my current set of changes, I am deprecating the support for NumPy ndarray as a kernel argument. Instead, we will only support USM-based array types: Coming to the specialization issue. Dpex does not yet define its own types for specialization and we are dependent on the Numba types such as Till we have a cleaner way to improve type support for
import dpctl
from numba import void, int32
import numba_dpex as dpex
@dpex.kernel(
func_or_sig=void(int32[:], int32[:], int32[:]),
device_type=dpctl.device_type.gpu,
backend=dpctl.backend.level_zero
)
def vecadd(a,b,c):
i = dpex.get_global_id(0)
c[i] = a[i] + b[i]
import numpy as np
a = np.arange(100)
b = np.arange(100)
c = np.zeros_like(a)
vecadd[100](a,b,c) # Exception raised
import dpnp
a = dpnp.arange(100)
b = dpnp.arange(100)
c = dpnp.zeros_like(a)
vecadd[100](a,b,c) # Works provided the device and backend specializations match
Point 2 above is a hack to get around missing proper type support in dpex for USM arrays. I am going to start that work next. Till then what do you feel about the above approach? |
For me that seems to work. One minor remark: why not pass a To make sure that I understand correctly, with those changes:
I think it's good 👌 Also as I understand, it seems that |
Ok. Let us use
Yes, that is what I was proposing. After some more thinking and analyzing the existing code, I have identified gaps with the above approach. As you stated, my proposed approach needs to map the Instead, how about the following alternative: import dpctl
from numba import void, int32
from numba_dpex import usm_array
import numba_dpex as dpex
usmarrty = usm_array(dtype=int32, ndim=2, layout="C", usm_type="device", device="gpu")
@dpex.kernel(
func_or_sig=void(usmarrty, usmarrty, usmarrty)
)
def vecadd(a,b,c):
i = dpex.get_global_id(0)
c[i] = a[i] + b[i]
import numpy as np
a = np.arange(100)
b = np.arange(100)
c = np.zeros_like(a)
vecadd[100](a,b,c) # Exception raised
import dpnp
a = dpnp.arange(100)
b = dpnp.arange(100)
c = dpnp.zeros_like(a)
vecadd[100](a,b,c) # Works provided the device and backend specializations match |
I also think it's fine. I haven't met usecases requiring use of Also there's maybe a middle-ground solution where both interfaces are supported, but the first syntax will create two entries in the dictionary When I explored this API the features I was looking for was to be able to retrieve the |
@fcharras I have pushed to main changes that address the eager compilation issue for kernels. A separate PR that addresses the same issue for The main changes are that:
The docs have not yet been updated. But, you can refer https://github.com/IntelPython/numba-dpex/blob/main/numba_dpex/examples/kernel/kernel_specialization.py. Will be good to get your feedback. |
👍 for the API and usage, this is excellent progress TY ! But unfortunately, for me I'd be curious to know if the new code (haven't read yet) would make easier the following usage:
|
I also see segfaults, makes me think that something is wrong with local memory. |
The problem most likely is because I without realizing the impact changed the way the global and local range args passed to the |
Possibly. I will like to look at your use case to be able to answer definitively.
Not yet. The problem is that NumPy ndarrays can still be passed to a kernel. In which case, we do a copy to USM from NumPy prior to the call and copy back after the wait. I deprecated the NumPy support for now. Let me enquire with another set of dpex users if they will be impacted. If there is no impact, then I will not wait an just remove the NumPy calls (the work was already started in #866). Once that is done, we can start returning the
Yes, AOT is fixed now. See these examples: https://github.com/IntelPython/numba-dpex/blob/main/numba_dpex/examples/kernel/kernel_specialization.py. |
By AOT I meant also being able to further distribute pre-compiled code in python packages (see relevant I'm not sure this is something we would really want for our project and I wouldn't see it as a short-term priority but if that's something you have in mind for the future it would be interesting to know.
If you decide to keep support for numpy ndarrays in the future (something indeed we don't use for |
NumPy support is deprecated and I will drop it as soon as 0.20 is out of the door. As for allowing kernels to execute asynchronously, let me track it in a separate issue. We should be able to return |
Closing this issue as the main issue of supporting eager compilation works as expected in main now. |
Following IntelPython/dpctl#894 I want to use
numba_dpex
with eager compilation, so I can access attributes such aspreferred_work_group_size_multiple
attached to the correspondingcompiler.Kernel
object that is available in theJitKernel.definitions
dictionary.This partially works since as long as the signature passed to
dpex.kernel
usesnumba.Array
(and nothing else) type, some compilation actually takes effect and I can access the attributes.However, when running the kernel later on, there's a new signature generated with types
USMNdArrayType
, a new compilation step happens, and a newKernel
is added to thedefinitions
, with the same signature but usingUSMNdArrayType
instead ofnumba.Array
.As a result, the eager compilation is mostly pointless since the kernel that is eagerly compiled will not be used, instead an additional lazy compilation will take place.
A solution to that might be to relax
npytypes_array_to_dpex_array
to also acceptUSMNdArrayType
arrays ?The text was updated successfully, but these errors were encountered: