-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using numba kernel caching for numba-dpex spirv kernel #815
Using numba kernel caching for numba-dpex spirv kernel #815
Conversation
chudur-budur
commented
Nov 4, 2022
- Have you provided a meaningful PR description?
- WIP for Implementing a more robust caching mechanism #814
- Have you added a test, reproducer or referred to an issue with a reproducer?
- Have you tested your changes locally for CPU and GPU devices?
- Have you made sure that new changes do not introduce compiler warnings?
- If this PR is a work in progress, are you filing the PR as a draft?
- The compiler module only contains the compiler pipeline to compiler SpirvKernel objects.
- Creates a separate module for the unpack and pack functions for kernel arguments. - The new API is intended for use from the Dispatcher class.
- The concept of a kernel was decoupled from the notion of dispatching of a kernel. The present implementation in compiler.py intermixes both things, making hard the separation of compute-follows-data based kernel launch and legacy `dpctl.device_context` based behavior. - Deprecates support for numpy arrays as kernel args. - Deprecates support for the square bracket notation using `__getitem__` to provide global and local ranges for a kernel launch. - Changes the behavior of specializing a kernel using only a signature. The new way to specialize will require a device type and a backend. - Improvements to exception messages using custom exceptions. - The new API is now inside `numba_dpex.core.kernel_interface`.
- The compiler module only contains the compiler pipeline to compiler SpirvKernel objects.
- Creates a separate module for the unpack and pack functions for kernel arguments. - The new API is intended for use from the Dispatcher class.
- The concept of a kernel was decoupled from the notion of dispatching of a kernel. The present implementation in compiler.py intermixes both things, making hard the separation of compute-follows-data based kernel launch and legacy `dpctl.device_context` based behavior. - Deprecates support for numpy arrays as kernel args. - Deprecates support for the square bracket notation using `__getitem__` to provide global and local ranges for a kernel launch. - Changes the behavior of specializing a kernel using only a signature. The new way to specialize will require a device type and a backend. - Improvements to exception messages using custom exceptions. - The new API is now inside `numba_dpex.core.kernel_interface`.
151b913
to
29b5c46
Compare
numba_dpex/caching.py
Outdated
""" | ||
Returns the unserialized CompileResult | ||
""" | ||
return compiler.CompileResult._rebuild(target_context, *payload) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diptorupd Is this going to work with numba_dpex
?
numba_dpex/caching.py
Outdated
if not self._impl.check_cachable(data): | ||
return | ||
self._impl.locator.ensure_cache_path() | ||
# key = self._index_key(sig, data.codegen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diptorupd This data
is a compiled object from spirc_kernel.py
(here). But the compiled kernel doesn't have codegen
.
numba_dpex/caching.py
Outdated
self._impl.locator.ensure_cache_path() | ||
# key = self._index_key(sig, data.codegen) | ||
key = self._index_key(sig) | ||
# data = self._impl.reduce(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diptorupd This reduce
doesn't work on the compiled kernel either. I am getting this:
python driver.py
kernel: 2
-----> Dispatcher.__init__()
/localdisk/work/akmkhale/numba-dpex/driver.py:28: DeprecationWarning: The [] (__getitem__) method to set global and local ranges for launching a kernel is deprecated. Use the execute function instead.
data_parallel_sum[(100,)](a, b, c)
/localdisk/work/akmkhale/numba-dpex/numba_dpex/core/kernel_interface/dispatcher.py:334: UserWarning: Use of __getitem__ to set the global_range attribute is deprecated. Use the keyword argument "global_range" of __call__ method to set the attribute.
warn(
/localdisk/work/akmkhale/numba-dpex/numba_dpex/core/kernel_interface/dispatcher.py:359: UserWarning: Kernel to be submitted without a local range letting the SYCL runtime select a local range. The behavior can lead to suboptimal performance in certain cases. Consider setting the local range value for the kernel execution.
The local_range keyword may be made a required argument in the future.
warn(
-----> dispatcher.kernel_name: data_parallel_sum
-----> spriv_kernel._compile()
-----> caching.load_overload()
-----> spirv_kernel._compile().cres == None
Traceback (most recent call last):
File "/localdisk/work/akmkhale/numba-dpex/driver.py", line 36, in <module>
main()
File "/localdisk/work/akmkhale/numba-dpex/driver.py", line 28, in main
data_parallel_sum[(100,)](a, b, c)
File "/localdisk/work/akmkhale/numba-dpex/numba_dpex/core/kernel_interface/dispatcher.py", line 427, in __call__
kernel.compile(
File "/localdisk/work/akmkhale/numba-dpex/numba_dpex/core/kernel_interface/spirv_kernel.py", line 167, in compile
cres = self._compile(
File "/nfs/site/home/akmkhale/.conda/envs/numba-dpex/lib/python3.9/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
return func(*args, **kwargs)
File "/localdisk/work/akmkhale/numba-dpex/numba_dpex/core/kernel_interface/spirv_kernel.py", line 120, in _compile
self._cache.save_overload(cres.signature, cres)
File "/localdisk/work/akmkhale/numba-dpex/numba_dpex/caching.py", line 95, in save_overload
data = self._impl.reduce(data)
File "/localdisk/work/akmkhale/numba-dpex/numba_dpex/caching.py", line 28, in reduce
return cres._reduce()
File "/nfs/site/home/akmkhale/.conda/envs/numba-dpex/lib/python3.9/site-packages/numba/core/compiler.py", line 183, in _reduce
libdata = self.library.serialize_using_object_code()
File "/nfs/site/home/akmkhale/.conda/envs/numba-dpex/lib/python3.9/site-packages/numba/core/codegen.py", line 922, in serialize_using_object_code
data = (self._get_compiled_object(),
File "/nfs/site/home/akmkhale/.conda/envs/numba-dpex/lib/python3.9/site-packages/numba/core/codegen.py", line 630, in _get_compiled_object
raise RuntimeError("no compiled object yet for %s" % (self,))
RuntimeError: no compiled object yet for <Library 'data_parallel_sum' at 0x7f1f1c4fbcd0>
bf4dda5
to
b43201c
Compare
…Python/numba-dpex into refactor/kernel_interfaces
… SpirvKernel attributes in reduce()/rebuild() Moving caching.py into numba_dpex/core
@diptorupd The caching mechanism is correct now, it's using all of the numba machinery. I just need to add backend and device type into the key. |
f99df29
to
b43201c
Compare
b43201c
to
7ee0f83
Compare
17676d2
to
b68a0a9
Compare
superceded by #843 |