Skip to content

Commit

Permalink
launching a kernel section.
Browse files Browse the repository at this point in the history
  • Loading branch information
Diptorup Deb committed Mar 19, 2024
1 parent bf21ff8 commit 67400a3
Show file tree
Hide file tree
Showing 5 changed files with 65 additions and 15 deletions.
3 changes: 3 additions & 0 deletions docs/source/ext_links.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,6 @@
.. _oneAPI GPU optimization guide: https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2024-0/general-purpose-computing-on-gpu.html
.. _dpctl.tensor.usm_ndarray: https://intelpython.github.io/dpctl/latest/docfiles/dpctl/usm_ndarray.html#dpctl.tensor.usm_ndarray
.. _dpnp.ndarray: https://intelpython.github.io/dpnp/reference/ndarray.html

.. _Dispatcher: https://numba.readthedocs.io/en/stable/reference/jit-compilation.html#dispatcher-objects
.. _Unboxes: https://numba.readthedocs.io/en/stable/extending/interval-example.html#boxing-and-unboxing
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.. _launching-an-async-kernel:

Async kernel execution
======================
43 changes: 43 additions & 0 deletions docs/source/user_guide/kernel_programming/call-kernel.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
.. _launching-a-kernel:

Launching a kernel
==================

A ``kernel`` decorated kapi function produces a ``KernelDispatcher`` object that
is a type of a Numba* `Dispatcher`_ object. However, unlike regular Numba*
Dispatcher objects a ``KernelDispatcher`` object cannot be directly invoked from
either CPython or another compiled Numba* ``jit`` function. To invoke a
``kernel`` decorated function, a programmer has to use the
:func:`numba_dpex.experimental.call_kernel` function.

To invoke a ``KernelDispatcher`` the ``call_kernel`` function requires three
things: the ``KernelDispatcher`` object, the ``Range`` or ``NdRange`` object
over which the kernel is to be executed, and the list of arguments to be passed
to the compiled kernel. Once called with the necessary arguments, the
``call_kernel`` function does the following main things:

- Compiles the ``KernelDispatcher`` object specializing it for the provided
argument types.

- `Unboxes`_ the kernel arguments by converting CPython objects into Numba* or
numba-dpex objects.

- Infer the execution queue on which to submit the kernel from the provided
kernel arguments. (TODO: Refer compute follows data.)

- Submits the kernel to the execution queue.

- Waits for the execution completion, before returning control back to the
caller.

The ``call_kernel`` function can be invoked both from CPython and from another
Numba* compiled function. Note that the ``call_kernel`` function supports only
synchronous execution of kernel and the ``call_kernel_async`` function should be
used for asynchronous mode of kernel execution (refer
:ref:`launching-an-async-kernel`).


.. seealso::

Refer the API documentation for
:func:`numba_dpex.experimental.launcher.call_kernel` for more details.
12 changes: 8 additions & 4 deletions docs/source/user_guide/kernel_programming/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -236,8 +236,10 @@ users should first convert their input tensor or ndarray object into either of
the two supported array types, both of which support DLPack.


Launching a kernel
==================
.. Launching a kernel
.. ==================
.. include:: ./call-kernel.rst

Advanced concepts
*****************
Expand All @@ -254,8 +256,10 @@ Group barrier synchronization
Atomic operations
=================

Async kernel execution
======================
.. Async kernel execution
.. ======================
.. include:: ./call-kernel-async.rst

Specializing a kernel or a device_func
======================================
18 changes: 7 additions & 11 deletions docs/source/user_guide/kernel_programming/writing-range-kernel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -121,17 +121,13 @@ kernel:
* At least one argument of a kernel should be an array. The requirement is so
that the kernel launcher (:func:`numba_dpex.experimental.call_kernel`) can
determine the execution queue on which to launch the kernel. Refer
the "Launching a kernel" section for more details.

A range kernel has to be executed by calling the
:py:func:`numba_dpex.experimental.launcher.call_kernel` function. The execution
range for the kernel is specified by creating an instance of a
:class:`numba_dpex.kernel_api.Range` class and passing the ``Range`` object as
an argument to ``call_kernel``. The ``call_kernel`` function does three things:
compiles the kernel if needed, "unboxes" all kernel arguments by converting
CPython objects into numba-dpex objects, and finally submitting the kernel to an
execution queue with the specified execution range. Refer the
:doc:`../../autoapi/index` for further details.
the :ref:`launching-a-kernel` section for more details.

A range kernel has to be executed via the
:py:func:`numba_dpex.experimental.launcher.call_kernel` function by passing in
an instance of the :class:`numba_dpex.kernel_api.Range` class. Refer the
:ref:`launching-a-kernel` section for more details on how to launch a range
kernel.

A range kernel is meant to express a basic `parallel-for` calculation that is
ideally suited for embarrassingly parallel kernels such as elementwise
Expand Down

0 comments on commit 67400a3

Please sign in to comment.