From ec660c35b3fa16db92f2ed3b86d1e5d6cc43d96d Mon Sep 17 00:00:00 2001
From: Diptorup Deb <diptorup.deb@intel.com>
Date: Wed, 17 May 2023 00:57:29 -0500
Subject: [PATCH] Updates to overview

---
 docs/index.rst             |  21 +++++---
 docs/sources/ext_links.txt |   6 ++-
 docs/sources/overview.rst  | 108 +++++++++++++++++++++++++++----------
 3 files changed, 100 insertions(+), 35 deletions(-)

diff --git a/docs/index.rst b/docs/index.rst
index 9c3c4329aa..6e4a7ce5c2 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -2,16 +2,23 @@ Welcome to numba-dpex's documentation!
 ======================================
 
 Numba data-parallel extension (`numba-dpex
-<https://github.com/IntelPython/numba-dpex>`_) is an Intel |reg|-developed
+<https://github.com/IntelPython/numba-dpex>`_) is a standalone
 extension to the `Numba <https://numba.pydata.org/>`_ JIT compiler. The
-extension adds kernel programming and automatic offload capabilities to the
-Numba compiler. Numba-dpex is part of `Intel oneAPI Base Toolkit
-<https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html>`_
+extension adds two features to Numba: an architecture-agnostic kernel
+programming API, and a backend for Numba's `jit` decorator that can parallelize
+NumPy-like array expressions and function calls on different data-parallel
+architectures. The parallelization feature for a NumPy-like API is provided by
+adding type and compilation support for the
+`dpnp <https://github.com/IntelPython/dpnp>`_ library, a data-parallel NumPy
+drop-in replacement library.
+
+Numba-dpex is part of `Intel oneAPI AI Kit
+<https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html>`_
 and distributed with the `Intel Distribution for Python*
 <https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-for-python.html>`_.
-The goal of the extension is to make it easy for Python programmers to
-write efficient and portable code for a mix of architectures across CPUs, GPUs,
-FPGAs and other accelerators.
+The extension is also available on Anaconda cloud and as a Docker image on
+GitHUb. Please refer the `Getting Started <user_guides/getting_started>`_ to
+learn more.
 
 Numba-dpex provides an API to write data-parallel kernels directly in Python and
 compiles the kernels to a lower-level kernels that are executed using a `SYCL
diff --git a/docs/sources/ext_links.txt b/docs/sources/ext_links.txt
index 24e360fa5c..2a06304ebe 100644
--- a/docs/sources/ext_links.txt
+++ b/docs/sources/ext_links.txt
@@ -2,11 +2,13 @@
     **********************************************************
     THESE ARE EXTERNAL PROJECT LINKS USED IN THE DOCUMENTATION
     **********************************************************
+
 .. _NumPy*: https://numpy.org/
 .. _Numba*: https://numba.pydata.org/
+.. _numba-dpex: https://github.com/IntelPython/numba-dpex
 .. _Python* Array API Standard: https://data-apis.org/array-api/
-.. _Intel Distribution for Python*: https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-for-python.html
 .. _OpenCl*: https://www.khronos.org/opencl/
+.. _oneAPI Level Zero: https://spec.oneapi.io/level-zero/latest/index.html
 .. _DPC++: https://www.apress.com/gp/book/9781484255735
 .. _Data Parallel Extension for Numba*: https://intelpython.github.io/numba-dpex/latest/index.html
 .. _SYCL*: https://www.khronos.org/sycl/
@@ -14,6 +16,8 @@
 .. _Data Parallel Extension for Numpy*: https://intelpython.github.io/dpnp/
 .. _IEEE 754-2019 Standard for Floating-Point Arithmetic: https://standards.ieee.org/ieee/754/6210/
 .. _Intel oneAPI Base Toolkit: https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html
+.. _Intel Distribution for Python*: https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-for-python.html
+.. _Intel AI Analytics Toolkit: https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html
 .. _Data Parallel Extensions for Python*: https://intelpython.github.io/DPEP/main/
 .. _Intel VTune Profiler: https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html
 .. _Intel Advisor: https://www.intel.com/content/www/us/en/developer/tools/oneapi/advisor.html
diff --git a/docs/sources/overview.rst b/docs/sources/overview.rst
index befbd6b37d..7b0b760340 100644
--- a/docs/sources/overview.rst
+++ b/docs/sources/overview.rst
@@ -1,44 +1,98 @@
 .. _overview
+.. include:: ./ext_links.txt
 
 Overview
 ========
 
+Data-Parallel Extensions for Numba* (`numba-dpex`_) is a standalone extension to
+the `Numba`_ JIT compiler. The extension adds two new features to Numba: an
+architecture-agnostic kernel programming API, and a backend extension that can
+parallelize NumPy-style array expressions and function calls on different types
+of data-parallel architectures.
 
-Numba data-parallel extension (`numba-dpex
-<https://github.com/IntelPython/numba-dpex>`_) is an Intel |reg|-developed
-extension to the `Numba <https://numba.pydata.org/>`_ JIT compiler.
+Numba-dpex is part of `Intel AI Analytics Toolkit`_ and distributed with the
+`Intel Distribution for Python*`_. The extension is also available on Anaconda
+cloud and as a Docker image on GitHub. Please refer the :doc:`getting_started`
+page to learn more.
 
-Numba-dpex extends Numba* by adding a kernel programming API based on `SYCL
-<https://www.khronos.org/sycl/>`_ and compilation support for
-Data-parallel Extension For NumPy*
-(`dpnp <https://github.com/IntelPython/dpnp>`_) a drop-in replacement for
-NumPy* based on SYCL.
+Main Features
+-------------
 
+- :doc:`user_manual/kernel_programming/index`
 
+    The kernel API has a design and API similar to what is provided by Numba's
+    ``cuda.jit`` module. However, the API uses the `SYCL`_ language runtime and
+    as such is extensible to several hardware categories. Presently, the API
+    supports only SPIR-V-based OpenCL and `oneAPI Level Zero`_ devices that are
+    supported by the Intel® `DPC++`_ SYCL compiler runtime.
 
-The
-extension adds kernel programming and automatic offload capabilities to the
-Numba compiler. Numba-dpex is part of `Intel oneAPI Base Toolkit
-<https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html>`_
-and distributed with the `Intel Distribution for Python*
-<https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-for-python.html>`_.
-The goal of the extension is to make it easy for Python programmers to
-write efficient and portable code for a mix of architectures across CPUs, GPUs,
-FPGAs and other accelerators.
+    A simple vector addition kernel can be expressed using the API as follows:
+
+    .. code-block:: python
+
+        import dpnp
+        import numba_dpex as dpex
+
+
+        @dpex.kernel
+        def sum(a, b, c):
+            i = dpex.get_global_id(0)
+            c[i] = a[i] + b[i]
+
+
+        a = dpnp.ones(1024, device="gpu")
+        b = dpnp.ones(1024, device="gpu")
+        c = dpnp.empty_like(a)
+
+        sum[dpex.Range(1024)](a, b, c)
+        print(c)
+
+    In the above example, as the programmer allocated the dpnp arrays on a
+    default ``gpu`` device, numba-dpex will compile and then execute the kernel
+    for that device. To change the execution target to a CPU, the device only
+    the device keyword needs to be changed to ``cpu`` when allocating the dpnp
+    arrays.
+
+- :doc: `user_manual/auto-offload`
+
+    The new backend extension to add automatic parallelization support has a
+    similar user-interface to Numba's existing loop-parallelizer. The feature
+    enables a programmer to "offload" NumPy-style vector expressions, library
+    calls, and ``prange`` loops to different hardware and execute them in
+    parallel. A key difference from Numba's loop-parallelizer is the ability to
+    parallelize on non-multicore CPU hardware.
+
+    requires the
+    `dpnp <https://github.com/IntelPython/dpnp>`_ library, a data-parallel
+    drop-in replacement for `NumPy*`_.
+
+
+    A programmer only needs to swap NumPy*
+    function calls, array expressions, and loops with the corresponding API and
+    array type from dpnp and  use numba-dpex's decorator in place of the default
+    Numba decorator to parallelize the expressions on different types of
+    hardware.
+
+Contributing
+============
+
+Refer the `contributing guide
+<https://github.com/IntelPython/numba-dpex/blob/main/CONTRIBUTING>`_ for
+information on coding style and standards used in numba-dpex.
+
+License
+=======
+
+Numba-dpex is Licensed under Apache License 2.0 that can be found in `LICENSE
+<https://github.com/IntelPython/numba-dpex/blob/main/LICENSE>`_. All usage and
+contributions to the project are subject to the terms and conditions of this
+license.
 
-Numba-dpex provides an API to write data-parallel kernels directly in Python and
-compiles the kernels to a lower-level kernels that are executed using a `SYCL
-<https://www.khronos.org/sycl/>`_ runtime library. Presently, only Intel's
-`DPC++ <https://github.com/intel/llvm/blob/sycl/sycl/doc/GetStartedGuide.md>`_
-SYCL runtime is supported via the `dpctl
-<https://github.com/IntelPython/dpctl>`_ package, and only OpenCL and Level Zero
-devices are supported. Support for other SYCL runtime libraries and hardwares
-may be added in the future.
 
 Along with the kernel programming API an auto-offload feature is also provided.
 The feature enables automatic generation of kernels from data-parallel NumPy
 library calls and array expressions, Numba ``prange`` loops, and `other
 "data-parallel by construction" expressions
 <https://numba.pydata.org/numba-doc/latest/user/parallel.html>`_ that Numba is
-able to parallelize. Following two examples demonstrate the two ways in
-which kernels may be written using numba-dpex.
+able to parallelize. Following two examples demonstrate the two ways in which
+kernels may be written using numba-dpex.