From ec660c35b3fa16db92f2ed3b86d1e5d6cc43d96d Mon Sep 17 00:00:00 2001 From: Diptorup Deb Date: Wed, 17 May 2023 00:57:29 -0500 Subject: [PATCH] Updates to overview --- docs/index.rst | 21 +++++--- docs/sources/ext_links.txt | 6 ++- docs/sources/overview.rst | 108 +++++++++++++++++++++++++++---------- 3 files changed, 100 insertions(+), 35 deletions(-) diff --git a/docs/index.rst b/docs/index.rst index 9c3c4329aa..6e4a7ce5c2 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -2,16 +2,23 @@ Welcome to numba-dpex's documentation! ====================================== Numba data-parallel extension (`numba-dpex -`_) is an Intel |reg|-developed +`_) is a standalone extension to the `Numba `_ JIT compiler. The -extension adds kernel programming and automatic offload capabilities to the -Numba compiler. Numba-dpex is part of `Intel oneAPI Base Toolkit -`_ +extension adds two features to Numba: an architecture-agnostic kernel +programming API, and a backend for Numba's `jit` decorator that can parallelize +NumPy-like array expressions and function calls on different data-parallel +architectures. The parallelization feature for a NumPy-like API is provided by +adding type and compilation support for the +`dpnp `_ library, a data-parallel NumPy +drop-in replacement library. + +Numba-dpex is part of `Intel oneAPI AI Kit +`_ and distributed with the `Intel Distribution for Python* `_. -The goal of the extension is to make it easy for Python programmers to -write efficient and portable code for a mix of architectures across CPUs, GPUs, -FPGAs and other accelerators. +The extension is also available on Anaconda cloud and as a Docker image on +GitHUb. Please refer the `Getting Started `_ to +learn more. Numba-dpex provides an API to write data-parallel kernels directly in Python and compiles the kernels to a lower-level kernels that are executed using a `SYCL diff --git a/docs/sources/ext_links.txt b/docs/sources/ext_links.txt index 24e360fa5c..2a06304ebe 100644 --- a/docs/sources/ext_links.txt +++ b/docs/sources/ext_links.txt @@ -2,11 +2,13 @@ ********************************************************** THESE ARE EXTERNAL PROJECT LINKS USED IN THE DOCUMENTATION ********************************************************** + .. _NumPy*: https://numpy.org/ .. _Numba*: https://numba.pydata.org/ +.. _numba-dpex: https://github.com/IntelPython/numba-dpex .. _Python* Array API Standard: https://data-apis.org/array-api/ -.. _Intel Distribution for Python*: https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-for-python.html .. _OpenCl*: https://www.khronos.org/opencl/ +.. _oneAPI Level Zero: https://spec.oneapi.io/level-zero/latest/index.html .. _DPC++: https://www.apress.com/gp/book/9781484255735 .. _Data Parallel Extension for Numba*: https://intelpython.github.io/numba-dpex/latest/index.html .. _SYCL*: https://www.khronos.org/sycl/ @@ -14,6 +16,8 @@ .. _Data Parallel Extension for Numpy*: https://intelpython.github.io/dpnp/ .. _IEEE 754-2019 Standard for Floating-Point Arithmetic: https://standards.ieee.org/ieee/754/6210/ .. _Intel oneAPI Base Toolkit: https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html +.. _Intel Distribution for Python*: https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-for-python.html +.. _Intel AI Analytics Toolkit: https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html .. _Data Parallel Extensions for Python*: https://intelpython.github.io/DPEP/main/ .. _Intel VTune Profiler: https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html .. _Intel Advisor: https://www.intel.com/content/www/us/en/developer/tools/oneapi/advisor.html diff --git a/docs/sources/overview.rst b/docs/sources/overview.rst index befbd6b37d..7b0b760340 100644 --- a/docs/sources/overview.rst +++ b/docs/sources/overview.rst @@ -1,44 +1,98 @@ .. _overview +.. include:: ./ext_links.txt Overview ======== +Data-Parallel Extensions for Numba* (`numba-dpex`_) is a standalone extension to +the `Numba`_ JIT compiler. The extension adds two new features to Numba: an +architecture-agnostic kernel programming API, and a backend extension that can +parallelize NumPy-style array expressions and function calls on different types +of data-parallel architectures. -Numba data-parallel extension (`numba-dpex -`_) is an Intel |reg|-developed -extension to the `Numba `_ JIT compiler. +Numba-dpex is part of `Intel AI Analytics Toolkit`_ and distributed with the +`Intel Distribution for Python*`_. The extension is also available on Anaconda +cloud and as a Docker image on GitHub. Please refer the :doc:`getting_started` +page to learn more. -Numba-dpex extends Numba* by adding a kernel programming API based on `SYCL -`_ and compilation support for -Data-parallel Extension For NumPy* -(`dpnp `_) a drop-in replacement for -NumPy* based on SYCL. +Main Features +------------- +- :doc:`user_manual/kernel_programming/index` + The kernel API has a design and API similar to what is provided by Numba's + ``cuda.jit`` module. However, the API uses the `SYCL`_ language runtime and + as such is extensible to several hardware categories. Presently, the API + supports only SPIR-V-based OpenCL and `oneAPI Level Zero`_ devices that are + supported by the IntelĀ® `DPC++`_ SYCL compiler runtime. -The -extension adds kernel programming and automatic offload capabilities to the -Numba compiler. Numba-dpex is part of `Intel oneAPI Base Toolkit -`_ -and distributed with the `Intel Distribution for Python* -`_. -The goal of the extension is to make it easy for Python programmers to -write efficient and portable code for a mix of architectures across CPUs, GPUs, -FPGAs and other accelerators. + A simple vector addition kernel can be expressed using the API as follows: + + .. code-block:: python + + import dpnp + import numba_dpex as dpex + + + @dpex.kernel + def sum(a, b, c): + i = dpex.get_global_id(0) + c[i] = a[i] + b[i] + + + a = dpnp.ones(1024, device="gpu") + b = dpnp.ones(1024, device="gpu") + c = dpnp.empty_like(a) + + sum[dpex.Range(1024)](a, b, c) + print(c) + + In the above example, as the programmer allocated the dpnp arrays on a + default ``gpu`` device, numba-dpex will compile and then execute the kernel + for that device. To change the execution target to a CPU, the device only + the device keyword needs to be changed to ``cpu`` when allocating the dpnp + arrays. + +- :doc: `user_manual/auto-offload` + + The new backend extension to add automatic parallelization support has a + similar user-interface to Numba's existing loop-parallelizer. The feature + enables a programmer to "offload" NumPy-style vector expressions, library + calls, and ``prange`` loops to different hardware and execute them in + parallel. A key difference from Numba's loop-parallelizer is the ability to + parallelize on non-multicore CPU hardware. + + requires the + `dpnp `_ library, a data-parallel + drop-in replacement for `NumPy*`_. + + + A programmer only needs to swap NumPy* + function calls, array expressions, and loops with the corresponding API and + array type from dpnp and use numba-dpex's decorator in place of the default + Numba decorator to parallelize the expressions on different types of + hardware. + +Contributing +============ + +Refer the `contributing guide +`_ for +information on coding style and standards used in numba-dpex. + +License +======= + +Numba-dpex is Licensed under Apache License 2.0 that can be found in `LICENSE +`_. All usage and +contributions to the project are subject to the terms and conditions of this +license. -Numba-dpex provides an API to write data-parallel kernels directly in Python and -compiles the kernels to a lower-level kernels that are executed using a `SYCL -`_ runtime library. Presently, only Intel's -`DPC++ `_ -SYCL runtime is supported via the `dpctl -`_ package, and only OpenCL and Level Zero -devices are supported. Support for other SYCL runtime libraries and hardwares -may be added in the future. Along with the kernel programming API an auto-offload feature is also provided. The feature enables automatic generation of kernels from data-parallel NumPy library calls and array expressions, Numba ``prange`` loops, and `other "data-parallel by construction" expressions `_ that Numba is -able to parallelize. Following two examples demonstrate the two ways in -which kernels may be written using numba-dpex. +able to parallelize. Following two examples demonstrate the two ways in which +kernels may be written using numba-dpex.