Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't reuse dpex.func device functions with different signatures #867

Closed
fcharras opened this issue Jan 11, 2023 · 5 comments · Fixed by #877
Closed

Can't reuse dpex.func device functions with different signatures #867

fcharras opened this issue Jan 11, 2023 · 5 comments · Fixed by #877
Assignees
Labels
user User submitted issue

Comments

@fcharras
Copy link

This used to work with numba_dpex<=0.18.1 but fails with >=0.19:

import numba_dpex as dpex
import dpctl.tensor as dpt
import numpy as np


@dpex.func
def g(array_in, idx, const):
    array_in[idx] = const

@dpex.kernel
def kernel_a(array_in):
    idx = dpex.get_global_id(0)
    g(array_in, idx, np.int64(0))

@dpex.kernel
def kernel_b(array_in):
    idx = dpex.get_global_id(0)
    g(array_in, idx, np.int32(0))  # NB: call with inputs of different types than in kernel_a

   
dtype = np.float32
size = 16
array_in = dpt.zeros(sh=(size,), dtype=dtype)

kernel_a[size, size](array_in)
kernel_b[size, size](array_in)

with numba_dpex>=0.19 this snippet gives the following exception:

<...traceback elided...>
LoweringError: Failed in dpex_nopython mode pipeline (step: Custom Lowerer with auto-offload support)
No definition for lowering <function get_global_id at 0x7f47c2edf700>(uint32,) -> int64

File "<ipython-input-14-e32b2bd5f6a0>", line 17:
def kernel_b(array_in):
    idx = dpex.get_global_id(0)
    ^

During: lowering "idx = call $4load_method.1($const6.2, func=$4load_method.1, args=[Var($const6.2, <ipython-input-14-e32b2bd5f6a0>:17)], kws=(), vararg=None, varkwarg=None, target=None)" at <ipython-input-14-e32b2bd5f6a0> (17)

There's a workaround that consists in ensuring that in both kernels the device function is at different memory locations with a different name, e.g.:

import numba_dpex as dpex
import dpctl.tensor as dpt
import numpy as np


def make_g():
    @dpex.func
    def g(array_in, idx, const):
        array_in[idx] = const
    return g

g = make_g()
@dpex.kernel
def kernel_a(array_in):
    idx = dpex.get_global_id(0)
    g(array_in, idx, np.int64(0))

g_ = make_g()
@dpex.kernel
def kernel_b(array_in):
    idx = dpex.get_global_id(0)
    g_(array_in, idx, np.int32(0))

   
dtype = np.float32
size = 16
array_in = dpt.zeros(sh=(size,), dtype=dtype)

kernel_a[size, size](array_in)
kernel_b[size, size](array_in)
@diptorupd
Copy link
Contributor

@fcharras #877 fixes the issue along with overall improvements to how we cache and specialize func decorated functions. Can you please test the branch and confirm that the issue you were seeing is addressed?

@fcharras
Copy link
Author

The cache in #877 looks like it's working. I've checked that there are cache hits here

device_driver_ir_module, kernel_module_name = artifact
when it's expected.

I have another issue but I don't think it's related to #877 since I checked for cache hit and misses. My user code works fine with numba_dpex==0.19.0. But now it's super slow and some kernels seem to output wrong values, I think it comes from more recent commits on main. I can try to bisect..

@fcharras
Copy link
Author

The new problems are not related to #877 but to #876 , reported in #816

@diptorupd
Copy link
Contributor

Thanks @fcharras for your review. I am going ahead and merging #877 and closing this ticket.

I am opening a separate issue to track the performance regression introduced by #816.

@chudur-budur
Copy link
Contributor

But now it's super slow and some kernels seem to output wrong values,

Which kernels are running slow and outputting wrong values?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
user User submitted issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants