Using numba kernel caching for numba-dpex spirv kernel #815

chudur-budur · 2022-11-04T07:36:58Z

Have you provided a meaningful PR description?
- WIP for Implementing a more robust caching mechanism #814
Have you added a test, reproducer or referred to an issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
If this PR is a work in progress, are you filing the PR as a draft?

- The compiler module only contains the compiler pipeline to compiler SpirvKernel objects.

- Creates a separate module for the unpack and pack functions for kernel arguments. - The new API is intended for use from the Dispatcher class.

- The concept of a kernel was decoupled from the notion of dispatching of a kernel. The present implementation in compiler.py intermixes both things, making hard the separation of compute-follows-data based kernel launch and legacy `dpctl.device_context` based behavior. - Deprecates support for numpy arrays as kernel args. - Deprecates support for the square bracket notation using `__getitem__` to provide global and local ranges for a kernel launch. - Changes the behavior of specializing a kernel using only a signature. The new way to specialize will require a device type and a backend. - Improvements to exception messages using custom exceptions. - The new API is now inside `numba_dpex.core.kernel_interface`.

chudur-budur · 2022-11-04T07:37:46Z

@diptorupd

- The compiler module only contains the compiler pipeline to compiler SpirvKernel objects.

- Creates a separate module for the unpack and pack functions for kernel arguments. - The new API is intended for use from the Dispatcher class.

- The concept of a kernel was decoupled from the notion of dispatching of a kernel. The present implementation in compiler.py intermixes both things, making hard the separation of compute-follows-data based kernel launch and legacy `dpctl.device_context` based behavior. - Deprecates support for numpy arrays as kernel args. - Deprecates support for the square bracket notation using `__getitem__` to provide global and local ranges for a kernel launch. - Changes the behavior of specializing a kernel using only a signature. The new way to specialize will require a device type and a backend. - Improvements to exception messages using custom exceptions. - The new API is now inside `numba_dpex.core.kernel_interface`.

chudur-budur · 2022-11-09T18:24:02Z

numba_dpex/caching.py

+        """
+        Returns the unserialized CompileResult
+        """
+        return compiler.CompileResult._rebuild(target_context, *payload)


@diptorupd Is this going to work with numba_dpex?

chudur-budur · 2022-11-09T18:28:46Z

numba_dpex/caching.py

+        if not self._impl.check_cachable(data):
+            return
+        self._impl.locator.ensure_cache_path()
+        # key = self._index_key(sig, data.codegen)


@diptorupd This data is a compiled object from spirc_kernel.py (here). But the compiled kernel doesn't have codegen.

chudur-budur · 2022-11-09T18:34:53Z

numba_dpex/caching.py

+        self._impl.locator.ensure_cache_path()
+        # key = self._index_key(sig, data.codegen)
+        key = self._index_key(sig)
+        # data = self._impl.reduce(data)


@diptorupd This reduce doesn't work on the compiled kernel either. I am getting this:

python driver.py kernel: 2 -----> Dispatcher.__init__() /localdisk/work/akmkhale/numba-dpex/driver.py:28: DeprecationWarning: The [] (__getitem__) method to set global and local ranges for launching a kernel is deprecated. Use the execute function instead. data_parallel_sum[(100,)](a, b, c) /localdisk/work/akmkhale/numba-dpex/numba_dpex/core/kernel_interface/dispatcher.py:334: UserWarning: Use of __getitem__ to set the global_range attribute is deprecated. Use the keyword argument "global_range" of __call__ method to set the attribute. warn( /localdisk/work/akmkhale/numba-dpex/numba_dpex/core/kernel_interface/dispatcher.py:359: UserWarning: Kernel to be submitted without a local range letting the SYCL runtime select a local range. The behavior can lead to suboptimal performance in certain cases. Consider setting the local range value for the kernel execution. The local_range keyword may be made a required argument in the future. warn( -----> dispatcher.kernel_name: data_parallel_sum -----> spriv_kernel._compile() -----> caching.load_overload() -----> spirv_kernel._compile().cres == None Traceback (most recent call last): File "/localdisk/work/akmkhale/numba-dpex/driver.py", line 36, in <module> main() File "/localdisk/work/akmkhale/numba-dpex/driver.py", line 28, in main data_parallel_sum[(100,)](a, b, c) File "/localdisk/work/akmkhale/numba-dpex/numba_dpex/core/kernel_interface/dispatcher.py", line 427, in __call__ kernel.compile( File "/localdisk/work/akmkhale/numba-dpex/numba_dpex/core/kernel_interface/spirv_kernel.py", line 167, in compile cres = self._compile( File "/nfs/site/home/akmkhale/.conda/envs/numba-dpex/lib/python3.9/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock return func(*args, **kwargs) File "/localdisk/work/akmkhale/numba-dpex/numba_dpex/core/kernel_interface/spirv_kernel.py", line 120, in _compile self._cache.save_overload(cres.signature, cres) File "/localdisk/work/akmkhale/numba-dpex/numba_dpex/caching.py", line 95, in save_overload data = self._impl.reduce(data) File "/localdisk/work/akmkhale/numba-dpex/numba_dpex/caching.py", line 28, in reduce return cres._reduce() File "/nfs/site/home/akmkhale/.conda/envs/numba-dpex/lib/python3.9/site-packages/numba/core/compiler.py", line 183, in _reduce libdata = self.library.serialize_using_object_code() File "/nfs/site/home/akmkhale/.conda/envs/numba-dpex/lib/python3.9/site-packages/numba/core/codegen.py", line 922, in serialize_using_object_code data = (self._get_compiled_object(), File "/nfs/site/home/akmkhale/.conda/envs/numba-dpex/lib/python3.9/site-packages/numba/core/codegen.py", line 630, in _get_compiled_object raise RuntimeError("no compiled object yet for %s" % (self,)) RuntimeError: no compiled object yet for <Library 'data_parallel_sum' at 0x7f1f1c4fbcd0>

…Python/numba-dpex into refactor/kernel_interfaces

… SpirvKernel attributes in reduce()/rebuild() Moving caching.py into numba_dpex/core

chudur-budur · 2022-11-22T06:00:36Z

@diptorupd The caching mechanism is correct now, it's using all of the numba machinery. I just need to add backend and device type into the key.

…mped to 0.56.4

chudur-budur · 2022-12-08T19:22:35Z

superceded by #843

Diptorup Deb and others added 8 commits October 22, 2022 16:50

Move passbuilder into core.

422db91

Add a compiler module into core.

6b39cd6

- The compiler module only contains the compiler pipeline to compiler SpirvKernel objects.

Adds an arg_pack_unpack module to kernel_interface

21dad4d

- Creates a separate module for the unpack and pack functions for kernel arguments. - The new API is intended for use from the Dispatcher class.

Change exception behavior

294e4b8

Temp commit to add a driver.py to run refactored code base.

b2ed65d

Improve exceptions.

3740611

Using numba kernel caching for numba-dpex spirv kernel

9b592ec

Diptorup Deb added 9 commits November 5, 2022 12:46

Move passbuilder into core.

7bdbdcf

Add a compiler module into core.

b3b79c6

- The compiler module only contains the compiler pipeline to compiler SpirvKernel objects.

Adds an arg_pack_unpack module to kernel_interface

9f3de6b

- Creates a separate module for the unpack and pack functions for kernel arguments. - The new API is intended for use from the Dispatcher class.

Change exception behavior

7643a75

Temp commit to add a driver.py to run refactored code base.

c700ee6

Improve exceptions.

c5a3479

Update the temporary driver.

cb980cb

Added global range checks, kernel now uses dispatcher.

29b5c46

diptorupd force-pushed the refactor/kernel_interfaces branch from 151b913 to 29b5c46 Compare November 5, 2022 17:46

chudur-budur and others added 9 commits November 5, 2022 13:52

Work in progress, adding caching.py

6d4e23d

Fix the pack/repack of Numpy arrays

56b7028

Resolve conflict with upstream

17062c4

Update existing compute follows data unit test.

11e5958

Fix failing test_sycl_usm_array_iface_interop tests.

2ed1ba6

Rewrite test to use refactored API.

b05c7fa

Update tests to check DI tag generation.

211d5d8

Fix to address failing unit test for strided numpy array support.

54ccbec

Problem with target_context codegen() and reduce()

8a22703

chudur-budur commented Nov 9, 2022

View reviewed changes

Fix typos

acedb24

Port func decorator to new API.

b43201c

diptorupd force-pushed the refactor/kernel_interfaces branch from bf4dda5 to b43201c Compare November 22, 2022 02:55

chudur-budur added 4 commits November 21, 2022 22:05

Merge branch 'refactor/kernel_interfaces' of https://github.com/Intel…

f99df29

…Python/numba-dpex into refactor/kernel_interfaces

Now this time the caching is working, we just need to pack and unpack…

86f1187

… SpirvKernel attributes in reduce()/rebuild() Moving caching.py into numba_dpex/core

Adding comments and clean-up

caa6b6b

Merged with upstream

f97449b

chudur-budur marked this pull request as ready for review November 22, 2022 05:58

chudur-budur requested a review from mingjie-intel as a code owner November 22, 2022 05:58

chudur-budur marked this pull request as draft November 22, 2022 06:02

chudur-budur added 2 commits November 22, 2022 01:46

Finalized caching implementation, cleaned up, docs added and numba bu…

54beb55

…mped to 0.56.4

Clean up numba_dpex/compiler.py

faef217

chudur-budur marked this pull request as ready for review November 22, 2022 08:06

diptorupd force-pushed the refactor/kernel_interfaces branch from f99df29 to b43201c Compare November 22, 2022 15:45

chudur-budur added 3 commits November 23, 2022 18:21

Adding caching mechanism for _kernel_jit() cases

fa4696c

Moving caching into Dispatcher class

182a73e

Added cache deletion and a basic unit test

d9e39d7

chudur-budur added the 1 - In Progress label Dec 3, 2022

chudur-budur marked this pull request as draft December 3, 2022 00:37

diptorupd force-pushed the refactor/kernel_interfaces branch from b43201c to 7ee0f83 Compare December 5, 2022 15:57

chudur-budur mentioned this pull request Dec 6, 2022

A new LRU cache mechanism for numba_dpex #833

Closed

5 tasks

diptorupd force-pushed the refactor/kernel_interfaces branch 4 times, most recently from 17676d2 to b68a0a9 Compare December 7, 2022 18:52

This was referenced Dec 8, 2022

A new LRU cache mechanism for numba_dpex with kernel pickling capability #842

Closed

A new LRU cache mechanism for numba_dpex with kernel pickling capability #843

Merged

chudur-budur closed this Dec 8, 2022

chudur-budur deleted the refac/kernint/caching branch December 20, 2022 03:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using numba kernel caching for numba-dpex spirv kernel #815

Using numba kernel caching for numba-dpex spirv kernel #815

chudur-budur commented Nov 4, 2022

chudur-budur commented Nov 4, 2022

chudur-budur Nov 9, 2022

chudur-budur Nov 9, 2022

chudur-budur Nov 9, 2022

chudur-budur commented Nov 22, 2022

chudur-budur commented Dec 8, 2022

Using numba kernel caching for numba-dpex spirv kernel #815

Using numba kernel caching for numba-dpex spirv kernel #815

Conversation

chudur-budur commented Nov 4, 2022

chudur-budur commented Nov 4, 2022

chudur-budur Nov 9, 2022

Choose a reason for hiding this comment

chudur-budur Nov 9, 2022

Choose a reason for hiding this comment

chudur-budur Nov 9, 2022

Choose a reason for hiding this comment

chudur-budur commented Nov 22, 2022

chudur-budur commented Dec 8, 2022