All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Improved performance of copy-and-cast operations from
numpy.ndarray
totensor.usm_ndarray
for contiguous inputs gh-1829 - Improved performance of copying operation to C-/F-contig array, with optimization for batch of square matrices gh-1850
- Improved performance of
tensor.argsort
function for all types gh-1859 - Improved performance of
tensor.sort
andtensor.argsort
for short arrays in the range [16, 64] elements gh-1866 - Implement radix sort algorithm to be used in
dpt.sort
anddpt.argsort
gh-1867 - Extended
dpctl.SyclTimer
withdevice_timer
keyword, implementing different methods of collecting device times gh-1872 - Improved performance of
tensor.cumulative_sum
,tensor.cumulative_prod
,tensor.cumulative_logsumexp
as well as performance of boolean indexing gh-1923 - Improved performance of
tensor.min
,tensor.max
,tensor.logsumexp
,tensor.reduce_hypot
for floating point type arrays by at least 2x gh-1932
- Fix for
tensor.result_type
when all inputs are Python built-in scalars gh-1877 - Improved error in constructors
tensor.full
andtensor.full_like
when provided a non-numeric fill value gh-1878 - Added a check for pointer alignment when copying to C-contiguous memory gh-1890
- Fixed incorrect result (issue gh-1901) in
tensor.cumulative_sum
and in advanced indexing gh-1902
- Update black version used in Python code style workflow gh-1828
- Fixed CI/CD workflow for building conda packages on Windows gh-1831
- Do not use Mambaforge variant of miniforge as deprecated gh-1844
- Use pybind11=2.13.6 gh-1845
- Remove unnecessary include in C++ header file gh-1846
- Build translation unit "simplify_iteration_space.cpp" compiled multiple times as a static library gh-1847
- Fix warning in documentation generation caused by
diff
docstring gh-1855 - Fix additional warnings when generating docs gh-1861
- Add missing include of SYCL header to "math_utils.hpp" gh-1899
- Add support of CV-qualifiers in
is_complex<T>
helper gh-1900 - Tuning work for elementwise functions with modest performance gains (under 10%) gh-1889
- Support for Python 3.13 for
dpctl
gh-1941
- Add missing include of SYCL header to "math_utils.hpp" gh-1899
- Fix for
tensor.result_type
when all inputs are Python built-in scalars gh-1904
- Updated installation instructions gh-1862
This release reaches an important milestone by making offloading fully asynchronous.
Calls to dpctl.tensor
submit tasks for execution to DPC++ runtime and return without waiting for execution of these tasks to finish.
The sequential semantics a user comes to expect from execution of Python script is preserved though.
The full list of changes that went into this release are:
- Implement
tensor.take_along_axis
per Python Array API specification gh-1778 - Implement
tensor.put_along_axis
to complementtensor.take_along_axis
gh-1798 - Support for 'device=tensor.kDLCPU' in
tensor.from_dlpack
function andtensor.usm_ndarray.__dlpack__
method gh-1781 - Support DLPack on Windows gh-1746
- Implement
tensor.nextafter
function per Python Array API specification gh-1730 - Implement
tensor.count_nonzero
andtensor.diff
functions from Python array API specification gh-1732, gh-1780 - Add support for
order="K"
to*_like
array creation functions, and change defaultorder
keyword value from'C'
to'K'
gh-1808 - Support for 'max dimensions' in Array API capabilities info data gh-1774
- Add support for device aspect 'emulated' gh-1691
dpctl::tensor::usm_memory
class defined indpctl4pybind11.hpp
adds constructor to create Python USM memory objects viewing into existing USM allocations, which can be made by an external library gh-1782- Add support for COVERAGE build type in project's CMake script gh-1692
- Change ownership of USM allocation by
dpctl.memory
objects, make executions ofdpctl.tensor
operations asynchronous gh-1705 - Add support for Python scalars by
tensor.where
function gh-1719 - Optimize division by Python scalar in statistical functions
tensor.mean
,tensor.std
,tensor.var
gh-1820 - Use transcendental functions from
sycl
namespace instead ofstd
namespace gh-1707 - Changes for compatibility with recent NumPy in runtime environment gh-1735, gh-1772, gh-1804
- Array creation function
tensor.zeros
to use asynchronousmemset
operation gh-1806 - The setter of
tensor.usm_ndarray.shape
property now supports Python scalar value gh-1786 - Use 'pyproject.toml' instead of 'setup.py' aligning with current packaging best practices gh-1660
- No longer set SOVERSION property in DPCTLSyclInterface library on Linux gh-1773
- Update version of 'pybind11' used gh-1758, gh-1812
- Handle possible exceptions by
usm_host_allocator
used withstd::vector
gh-1791 - Use
dpctl::tensor::alloc_utils::sycl_free_noexcept
instead ofsycl::free
inhost_task
tasks associated with life-time management of temporary USM allocations gh-1797 - Add
"same_kind"
-style casting for in-place mathematical operators oftensor.usm_ndarray
gh-1827, gh-1830
- Fix setting of release variable Sphinx config file gh-1685
- Handle possible NULL return value from device aspect queries
DPCTLDevice_GetMaxWorkGroupSize1d
andDPCTLDevice_GetMaxWorkGroupSize2d
gh-1690 - Add license header to conda script files gh-1695
- Fix
tensor.round
behavior on CUDA devices gh-1700 - Add missing
#include <sstream>
gh-1701 - Fix for issue 1724 gh-1728
- Correct USM type for return array of
tensor.extract
function gh-1727 - Fix for
tensor.unique_all
andtensor.unique_inverse
to always return index arrays with default indexing data type gh-1741 - Propagate read-only flag from
__sycl_usm_array_interface__
intensor.asarray
function gh-1756 tensor.clip
to handle Python scalars which are out of bound for the data type of integral array gh-1759- Avoid dead-locking by releasing GIL around blocking operations in libtensor gh-1753
- Element-wise
tensor.divide
and comparison operations allow greater range of Python integer and integer array combinations gh-1771 - Fix for unexpected behavior when using floating point types for array indexing gh-1792
- Enable
pytest --pyargs dpctl.tests
gh-1833 - Fix for undefined behavior in indexing using integer arrays gh-1894
- Improve performance of
test_sort_complex_fp_nan
gh-1704 - Improve exception wording raised by
tensor.broadcast_arrays()
gh-1720 - Remove
template
keyword in method call ofsycl::kernel_bundle
gh-1726 - Backport changelog edits from maintenance/0.17.x gh-1736
- Replace uses of 'intel' channels in docs and readme file gh-1737
- Update references to deprecated environment variable
SYCL_DEVICE_FILTER
gh-1740 - Correction for installation instruction steps gh-1754
- Fix for crash during testing with open source SYCL bundle by updating CPU RT library used gh-1762
- Add missing include to fix build break with newer LLVM gh-1776
- Add
#include <utility>
for definition ofstd::move
used gh-1787 - Change to CMake script to accomodate DPC++ transition from PI to UR architecture gh-1788
- Document
tensor._flags.Flags
class gh-1794 - Fix for unreferenced unreleased bug in copy-and-cast code logic gh-1799
- Explicitly include headers used in C++ translation units implementing reduction operations gh-1802
- Clean-up uses of
Strided1DIndexer
class gh-1805 - Tweak to readability of C++ code implementing matrix-matrix multiplication gh-1810
- Do not add
sycl::event
associated with compute task to vector of events representing execution ofhost_task
gh-1807 - Remove 'level-zero' conda package from run-time dependencies of 'dpctl' since Intel GPU driver stack now explicitly depends on
libze1
package which provides Level-Zero loader library gh-1801, gh-1840 - Use dedicated type-support matrices for in-place element-wise binary operations gh-1816
- Remove recommendation to install wheels from Anaconda PyPI index gh-1819
- Removed use of post-link and pre-unlink conda scripts in
dpctl
gh-1821 - Pin compiler used to build 0.18.0 version to 2025.0.0 gh-1822
- A varienty of changes to continuous integration/delivery (CI/CD) supporting scripts to keep CI running smoothly: gh-1686, gh-1688, gh-1697, gh-1698, gh-1703, gh-1702, gh-1709, gh-1712, gh-1713, gh-1722, gh-1725, gh-1729, gh-1733, gh-1721, gh-1743, gh-1739, gh-1747, gh-1748, gh-1750, gh-1752, gh-1767, gh-1768, gh-1775, gh-1783, gh-1790, gh-1795, gh-1796, gh-1800, gh-1760, gh-1803, gh-1777, gh-1813, gh-1817, gh-1818
This release features updated documentation web-page https://intelpython.github.io/dpctl/latest/index.html, adds cumulative reductions, and complies with revision 2023.12 of Python Array API specification.
- Added pybind11 caster for
sycl::half
to map to/from Pythonfloat
to"dpctl4pybind11.hpp"
header: gh-1655 - Added support for DLPack data interchange per Python Array API 2023.12 specification: gh-1667
- Implemented
tensor.cumulative_sum
,tensor.cumulative_prod
andtensor.cumulative_logsumexp
: gh-1602
- Expanded documentation for
dpctl
: gh-1619 - Expanded
utils.intel_device_info
functionality: gh-1656 - Improved performance of elementwise operations: gh-1651
- Efficiency improvement by avoiding unnecessary copying of
sycl::queue
: gh-1645 dpctl
uses pybind11 2.12.0: gh-1640- Improved performance of
tensor.reshape
operation withorder="F"
when copying is needed, or requested: gh-1677
- Fixed initialization of byte type constants in
dpctl_capi
Python/C API loader class in"dpctl4pybind11.hpp"
: gh-1665 - Fixed crash in
tensor.sort
reported for a CPU device and a CUDA device: gh-1676 - Fixed race condition in accumulation kernel for custom operations that caused test failures with AMD CPUs: gh-1624
- Fixed comparison operators for mixed signed and unsigned integral types: gh-1650
- Support use of index arrays of different integral types in indexing operations: gh-47
- Fixed source code to compile for NVidia(TM) GPUs with DPC++ 2024.1: gh-1630
- Corrected
tensor.tile
for scalar inputs and empty repetitions: gh-1628 - Fixed support for
out
keyword intensor.matmul
: gh-1610 - Fixed bug in basic slicing of empty arrays: gh-1680
- Fixed bug in
tensor.bitwise_invert
for boolean input array: gh-1681 - Fixed bug in
tensor.repeat
on zero-size input arrays: gh-1682 - Fixed bug in
tensor.searchsorted
for 0d needle vector and strided hay: gh-1694
This is a bug-fix release, which also provides a change needed by numba_dpex
project to support dispatching kernels
consuming instances of sycl::local_accessor
template type.
- Changed behavior of
dpctl.tensor.usm_ndarray.__dlpack_device__
method to return device id of the parent unpartitioned device if array is allocated on a sub-device instead of raising an exception: #1604 - Array creation functions and the
usm_ndarray
constructor indpctl.tensor
submodule now use cached default-selected device to improve performance: #1606 - Changed treatment of
axis
keyword fordpctl.tensor.tensordot
anddpctl.tensor.vecdot
to align with Python Array API 2023.12 specification: #1608 - Changed implementation of
DPCTLQueue_SubmitRange
,DPCTLQueue_SubmitNDRange
in DPCTLSyclInterface library to supportsycl::local_accessor
arguments needed bynumba_dpex
; the enumDPCTLKernelArgType
to correspond to C++ disjoint types: #1609, #1611, #1612
- Fixed a crash on Windows platform during execution of getter of
dpctl.SyclPlatfom.default_context
property: : #1604 - Fixed kernel submission error on NVidia CUDA GPUs during
dpctl.tensor.matmul
operation: #1605 - Fixed corruption of context cache table entries: #1607
- Fixed incorrect result from
dpctl.tensor.tensordot
reported in issue #1570: #1608 - Fixed library name output by
python -m dpctl --library
: #1615
This release will require DPC++ 2024.1.0, which no longer supports Intel Gen9 integrated GPUs found in Intel CPUs of 10th generation and older. Featurewise, this release is identical to 0.15.1.
This release reaches milestone of 100% compliance of dpctl.tensor
functions with Python Array API 2022.12 standard for the main namespace.
- Added reduction functions
dpctl.tensor.min
,dpctl.tensor.max
,dpctl.tensor.argmin
,dpctl.tensor.argmax
, anddpctl.tensor.prod
per Python Array API specifications: #1399 - Added dedicated in-place operations for binary elementwise operations and deployed them in Python operators of
dpctl.tensor.usm_ndarray
type: #1431, #1447 - Added new elementwise functions
dpctl.tensor.cbrt
,dpctl.tensor.rsqrt
,dpctl.tensor.exp2
,dpctl.tensor.copysign
,dpctl.tensor.angle
, anddpctl.tensor.reciprocal
: #1443, #1474 - Added statistical functions
dpctl.tensor.mean
,dpctl.tensor.std
,dpctl.tensor.var
per Python Array API specifications: #1465 - Added sorting functions
dpctl.tensor.sort
anddpctl.tensor.argsort
, and set functionsdpctl.tensor.unique_values
,dpctl.tensor.unique_counts
,dpctl.tensor.unique_inverse
,dpctl.tensor.unique_all
: #1483 - Added linear algebra functions from the Array API namespace
dpctl.tensor.matrix_transpose
,dpctl.tensor.matmul
,dpctl.tensor.vecdot
, anddpctl.tensor.tensordot
: #1490, #1525, #1541 - Added
dpctl.tensor.clip
function: #1444, #1505 - Added custom reduction functions
dpt.logsumexp
(reduction using binary functiondpctl.tensor.logaddexp
),dpt.reduce_hypot
(reduction using binary functiondpctl.tensor.hypot
): #1446 - Added inspection API to query capabilities of Python Array API specification implementation: #1469
- Support for compilation for NVIDIA(R) sycl target with use of CodePlay oneAPI plug-in: #1411, #1124
- Added
dpctl.utils.intel_device_info
function to query additional information about Intel(R) GPU devices: gh-1428 and gh-1445 - Added support for two new device descriptors,
dpctl.SyclDevice.max_mem_alloc_size
anddpctl.SyclDevice.max_clock_frequency
: #1530
- Functions
dpctl.tensor.result_type
anddpctl.tensor.can_cast
became device-aware: #1488, #1473 - Implementation of method
dpctl.SyclEvent.wait_for
changed to usesycl::event::wait
instead ofsycl::event::wait_and_throw
: gh-1436 dpctl.tensor.astype
was changed to supportdevice
keyword as per Python Array API specification: #1511- C++ header files in
libtensor/include/kernels
containing implementations of SYCL kernels no longer depends on "pybind11.h": #1516
- Fixed issues with
dpctl.tensor.repeat
support foraxis
keyword: #1427, #1433 - Fix for gh-1503 for bug
usm_ndarray.__setitem__
: #1504 - Other bug fixes: #1485, #1477, #1512
- Added
dpctl.tensor.floor
,dpctl.tensor.ceil
,dpctl.tensor.trunc
elementwise functions. - Added
dpctl.tensor.hypot
,dpctl.tensor.logaddexp
elementwise functions. - Added trigonometric (
dpctl.tensor.sin
,dpctl.tensor.cos
,dpctl.tensor.tan
) and hyperbolic (dpctl.tensor.sinh
,dpctl.tensor.cosh
,dpctl.tensor.tanh
) elementwise functions and their inverses (dpctl.tensor.asin
,dpctl.tensor.asinh
,dpctl.tensor.acos
,dpctl.tensor.acosh
,dpctl.tensor.atan
,dpctl.tensor.atanh
). - Added
dpctl.tensor.round
function. - Added
dpctl.tensor.sign
anddpctl.tensor.remainder
elementwise functions. - Added bitwise elementwise functions
dpctl.tensor.bitwise_and
,dpctl.tensor.bitwise_xor
,dpctl.tensor.bitwise_or
,dpctl.tensor.bitwise_invert
- Added bitwise shift functions
dpctl.tensor.bitwise_left_shift
anddpctl.tensor.bitwise_right_shift
. - Added
dpctl.tensor.atan2
anddpctl.tensor.signbit
elementwise functions. - Added
dpctl.tensor.minumum
anddpctl.tensor.maximum
binary elementwise functions. - Supported equality checking and hashing for
dpctl.SyclPlatform
. - Implemented
types
property for all unary and binary elementwise functions #1361 - Added
dpctl.tensor.repeat
anddpctl.tensor.tile
functions. - Added
dpctl.tensor.matrix_transpose
function.
- Enabled support for Python arithmetic, in-place arithmetic, reflexive arithmetic, comparison, and bitwise operators for
dpctl.tensor.usm_ndarray
type #1324. - Removed
dpctl.tensor.numpy_usm_shared
obsolete class and associated tests which were being skipped #1310 - Transitioned
dpctl
codebase to Cython 3. - Improved performance of boolean reduction functions
dpctl.tensor.all
anddpctl.tensor.any
. - Improved performance of summation function
dpctl.tensor.sum
. - Improved in-place arithmetic operations for addition, subtraction and multiplication.
- Updated codebase per SYCL-2020 intel/llvm compiler deprecation warnings.
- Improved performance of advanced boolean indexing for arrays whose size fits in 32-bit signed integer type.
- Removed deprecated
DPCTLDevice_GetMaxWorkItemSizes
function from the SyclInterface library. - Improved performance of
dpctl.tensor.reshape
in the case when a copy is being made. - Improved performance of
dpctl.tensor.roll
function.
- Fixed issues identified by Coverity security scans.
- Fixed issues #1279, #1350, #1344, #1327, #1241, #1250, #1293.
- Added
dpctl.tensor.log2
anddpctl.tensor.log10
: #1267 - Added
dpctl.tensor.negative
,dpctl.tensor.positive
,dpctl.tensor.square
#1268 - Added
dpctl.tensor.logical_not
,dpctl.tensor.logical_and
,dpctl.tensor.logical_or
,dpctl.tensor.logical_xor
#1270
dpctl.tensor.astype
behavior fornewdtype=None
changes #1261dpctl.tensor.usm_ndaray
constructor default value ofdtype
keyword argument changed toNone
: #1265- Support for
out
arguments that overlap with inputs for unary elementwise functions#1281 - Copying from one array to another a no-op if both arrays view into the same memory #1284
- Added
dpctl.tensor.less_equal
,dpctl.tensor.greater
,dpctl.tensor.greater_equal
: #1239
- Optimized in-place arithmetic operations for updating matrix with rows/columns via broadcasting: #1244
- Fixed handling of 0d arrays in
dpctl.tensor.sum
: #1238
- Added support of
axis=None
indpctl.tensor.concat
#1125 - Added caching for
dpctl.SyclDevice.filter_string
property #1127 - Added
dpctl.tensor.isdtype
from array API #1133 - Added
dpctl.tensor.unstack
,dpctl.tensor.moveaxis
,dpctl.tensor.swapaxes
#1137, #1174 - Allow for mutation of
dpctl.tensor.usm_ndarray.flags.writable
#1141 - Added
dpctl.tensor.where
from array API #1147 - Include libtensor headers in
dpctl
installation layout #1185 - Added new properties of
dpctl.tensor.usm_ndarray
object #1199 - Added a list of unary and binary elementwise functions from array API:
- #1203:
dpctl.tensor.add
,dpctl.tensor.divide
,dpctl.tensor.isnan
,dpctl.tensor.isinf
,dpctl.tensor.isfinite
,dpctl.tensor.cos
,dpctl.tensor.abs
,dpctl.tensor.equal
- #1205:
dpctl.tensor.sqrt
- #1209: implements
out
keyword argument - #1211:
dpctl.tensor.multiply
,dpctl.tensor.subtract
- #1214:
dpctl.tensor.not_equal
- #1216:
dpctl.tensor.exp
,dpctl.tensor.sin
- #1217:
dpctl.tensor.real
,dpctl.tensor.imag
,dpctl.tensor.proj
- #1218:
dpctl.tensor.log
,dpctl.tensor.log1p
,dpctl.tensor.expm1
- #1221:
dpctl.tensor.floor_divide
- #1235:
dpctl.tensor.less
- #1237: in-place support for addition, multiplication and subtraction
- #1203:
- Added
dpctl.tensor.all
anddpctl.tensor.any
#1204 - Added
dpctl.tensor.sum
#1210
- Updated examples of native Python extensions built using
dpctl
#1108 - Used security flags to compile and link native extensions of
dpctl
#1109 - Changed types of
dpctl.tensor.finfo
anddpctl.tensor.iinfo
output structure per array API spec #1110 - Consolidated multiple USM temporaries life-time management
host_task
s to improve test suite stability #1111 - MAINT: Improved cmake target dependency tracking #1112
- MAINT: Improved docstrings for existing
dpctl.tensor
functions #1123 - Changed default value of
mode
keyword indpctl.tensor.take
anddpctl.take.put
fromclip
towrap
#1132 - Added support for (nested) sequence of
dpctl.tensor.usm_ndarray
objects indpctl.tensor.asarray
#1139 - Improved exception handling in
dpctl.tensor.usm_ndarray.__setitem__
special method #1146 - Simplified implementation of copy-and-cast kernels and removed special casing for 2D arrays to conserve binary size #1165
- Improved speed of
dpctl.tensor.usm_ndarray
printing functionality #1187 - Require DPC++ RT 2023.1 to build and run
dpctl
#1195 - Compile offloading native extensions with
-fno-sycl-id-queries-fit-in-int
fixing gh-1184, #1200 - Transition to conda-forge ecosystem #1213
- Fix to add empty values check for
dpctl.tensor.place
#1105, #1106 - Fixed gh-1089 by improving
dpctl.tensor.asarray
handling of NumPy arrays viewing into host-accessible USM allocation objects. - MAINT: Fixed build break with newer GCC and SYCLOS #1118
- Fixed a bug in basic indexing of
dpctl.tensor.usm_ndarray
#1136
- Fixed a bug with boolean advanced indexing #1103
- Added
dpctl.SyclDevice.partition_max_sub_devices
property #1005 - Added
dpctl.program.SyclKernel.max_sub_group_size
property #1028 - Implemented printing of
usm_ndarray
#1013, #1043, #1060 - Implemented support for advanced indexing for
dpctl.tensor.usm_ndarray
#1095, #1097, #1099, #1101 - Implemented support for platform listing in
dpctl.__main__
script #1014 - Improved performance of
dpctl.tensor.asnumpy
#1026 - Added
UsmNDArray_Make*
C-API for constructingdpctl.tensor.usm_ndarray
from native allocations #1050, #1067 - Added support for
dpctl.SyclDevice.native_vector_width_*
device descriptors #1075 - Added
dpctl::tensor::usm_ndarray::get_shape_vector
anddpctl::tensor::usm_ndarray::get_strides_vector
methods #1090
-
Removed
dpctl.select_host_device
,dpctl.has_host_device
,dpctl.SyclDevice.is_host
, anddpctl.SyclDevice.has_aspect_host
since support for host device has been removed in DPC++ 2023 and from SYCL 2020 spec #1028 -
usm_ndarray
is made writable by default #1012, and writable flag is now checked by__setitem__
. -
Added convenience signature for C++ utility function in "dpctl4pybind11.hpp" #1016
-
Improved error reported when attempting to submit kernel that uses a data-type unsupported by target device #1018, #1040
-
Updated C++ code to require DPC++ 2023.0.0 or newer #1028, #1066
-
The
dpctl.tensor.Device
class supportsprint_device_info
method #1029, equality comparison, and hashing #1048 -
Updated version of pybind11 used to 2.10.2 #1031
-
Improved internal utility responsible for reduction of iteration space dimensionality #1044, #1054
-
Changed return type of
DCPCTLUSM_GetPointerType
function in SyclInterface library #1061, #1065 -
Updated supported version of DLPack to 0.8 #1073
-
Implemented queue cache per context/device pair and deployed it in
dpctl.memory
,dpctl.tensor.from_dlpack
anddpctl.tensor
array creation functions #1076, #1079 -
Maintainance, CI work: #1001, #1009, #1011, #1024, #1030, #1032, #1035, #1037, #1039, #1041, #1045, #1047, #1055, #1057, #1059, #1068, #1070, #1074, #1077, #1078, #1081, #1084, #1085, #1088, #1086, #1092, #1093
- Fixed error gh-998 in forming Python exception, #999.
- A small memory leak fixed, #1000
- Improved dtype support in
dpctl.tensor.full
, PR #1002 - Added missing header file #1008 fixing gh-1007
- Fixed a typo in device-specific dtype mapping #1015
- Fixed default device integer type to align with NumPy's behavior on Windows #1017
- Fixed unexpected overflow in
dpctl.tensor.linspace
when one of the parameters is the largest floating point value #1034 - Constructors
dpctl.tensor.empty
,dpctl.tensor.zeros
, andusm_ndarray
constructor itself no longer allow to create array with data-types not supported by targeted device #1042 - Fixed parameter validation in
dpctl.SyclQueue
constructor #1052 - Fixed
usm_type
of the resulting array indpctl.tensor.tril
anddpctl.tensor.triu
functions #1062 - Used DPC++ configuration files to ensure correct use of conda compiler toolchain on Linux #1072
- Fixed issue with empty argument of
dpctl.tensor.meshgrid
function #1080 - Fixed linking problem on Windows enabling
dpctl
to be functional on Windows for devices not supporting some data types #1083
- Implemented
dpctl.tensor.linspace
function from array-API #875. - Implemented
dpctl.tensor.eye
function from array-API #896. - Implemented
dpctl.tensor.tril
anddpctl.tensor.triu
functions from array-API #910. - Added data type objects to
dpctl.tensor
namespace,finfo
,iinfo
,can_cast
, andresult_type
functions #913. - Implemented
dpctl.tensor.meshgrid
creation function from array-API #920. - Implemented convenience class to represent output of
dpctl.tensor.usm_ndarray.flags
property #921. - Added new device attributes and kernel's device-specific attributes #894.
- Added
dpctl.utils.onetrace_enabled
context manager for targeted trace collection #903. - Added support for
stream
keyword in__dlpack__
method, enabling support for sendingusm_ndarray
using mpi4py #906. dpctl.tensor.asarray
can now transition data between incompatible devices, #951.- Introduced
"syclinterface/dpctl_sycl_types_casters.hpp"
header file with declaration of conversion routines between SYCL type pointers and SyclInterface library opaque pointers #960. - Added C-API to
dpctl.program.SyclKernel
anddpctl.program.SyclProgram
. Added type casters for new types to "dpctl4pybind11" and added an example demonstrating its use #970. - Introduced "dpctl/sycl.pxd" Cython declaration file to streamline use of SYCL functions from Cython, and added an example demonstrating its use #981.
- Added experimental support for sharing data allocated on sub-devices via dlpack #984.
- Added
dpctl.SyclDevice.sub_group_sizes
property to retrieve supported sizes of sub-group by the device #985.
- Improved queue compatibility testing in
dpctl.tensor
's implementation module #900. - Added automatic measurement of array-API conformance test suite in CI #901.
- Improved performance of array metadata transfer from host to device #912.
- Used
os.add_dll_directory
on Windows to ensure thatDPCTLSyclInterface
library can be found #918. - Refactored
dpctl.tensor
's implementation module #941 to streamline adding new functionality. Streamlineddpctl::tensor::usm_ndarray
class implementation. - Added debugging messaging in case when
DPCTLDynamicLib::getSymbol
encounters errors #956. - Updated code base according to changes in DPC++ compiler #952, #957, #958.
- Changed
dpctl
to use pybind11 2.10.1 #967. - Extended
dpctl.tensor.full
to accept 0d and higher dimensional arrays for fill-value parameter #982 and #995.
- Improved SyclDevice constructor error message #893.
- Fixed issue gh-890 about
dpctl.tensor.reshape
function #915. - Fixed unexpected
UnboundLocalError
exception in #922. - Fixed bugs in
dpctl.tensor.arange
in #945. - Fixed issue with type inferencing in
dpctl.tensor.asarray
in #949. - Added missing docstrings for
dpctl.SyclDevice
properties #964.
-
Implemented and deployed dedicated kernels for copying with casting #781, used in
__setitem__
, implementaion ofasarray
,dpctl.tensor.copy
functions. -
Implemented dedicated copying kernel for
dpctl.tensor.reshape
function #810, added support forcopy
keyword #807. -
Implemented dedicated kernel to copy with casting from
numpy.ndarray
intodpctl.tensor.usm_ndarray
#817. -
Implemented
dpctl.tensor.permute_dims
function from array-API #787. -
Implemented
dpctl.tensor.expand_dims
function from array-API #788. -
Implemented
dpctl.tensor.squeeze
function from array-API #790. -
Implemented
dpctl.tensor.broadcast_to
function from array-API #791. -
Implemented
dpctl.tensor.broadcast_arrays
function from array-API #798. -
Implemented
dpctl.tensor.flip
function from array-API #801. -
Implemented
dpctl.tensor.usm_ndarray.mT
property per array-API #805. -
Implemented
dpctl.tensor.roll
function from array-API #809. -
Implemented
dpctl.tensor.arange
function from array-API #814. -
Implemented
dpctl.tensor.zeros
function from array-API #816. -
Implemented
dpctl.tensor.zeros
function from array-API #816. -
Implemented
dpctl.tensor.ones
,dpctl.tensor.full
,dpctl.tensor.empty_like
,dpctl.tensor.zeros_like
,dpctl.tensor.ones_like
,dpctl.tensor.full_like
functions from array-API #822. -
Implemented
DPCTLQueue_Memset
function in SyclInterface library #812, and exposed it fordpctl.memory.MemoryUSM*
classes #815. -
Implemented
dpctl.utils.get_coerced_usm_type
to deduced usm type of the output array from types of input arrays in compute-follows-data execution model #797. -
Added
dpctl.SyclDevice.profiling_timer_resolution
property #825. -
Added
dpctl.SyclDevice.platform
anddpctl.SyclPlatform.default_context
properties #827. -
Provided pybind11 example for functions working on
dpctl.tensor.usm_ndarray
container applying oneMKL functions #780, #793, #819. The example was expanded to demonstrate implementing iterative linear solvers (Chebyshev solver, and Conjugate-Gradient solver) by asynchronously submitting individual SYCL kernels from Python #821, #833, #838. -
Wrote manual page about working with
dpctl.SyclQueue
#829. -
Added cmake scripts to dpctl package layout and a way to query the location #853.
-
Implemented
dpctl.tensor.concat
function from array-API #867. -
Implemented
dpctl.tensor.stack
function from array-API #872.
- Enhanced coverage collection for SyclInterface library by also collecting it during pytest run and combining traces with those collected during C-test run #818. This change also allows to not rebuild SyclInterface library when building C-test executable.
- Exported
keep_args_alive
utility indpctl4pybind11.hpp
header #820. The utility usessycl::handler::host_task
to keep given Python arguments alive until eacsycl::event
from the given vector of events is complete. The host task is scheduled on the SYCL queue provided as the first argument. - Changed the size of struct underlying
dpctl.SyclEvent
to avoid storing Python object previously used to keep kernel arguments scheduled withdpctl.SyclQueue.submit
#823. - Fixed docstring for
dpctl.SyclTimer
#824. - Changed type of exceptions raised on failure to create
dpctl.SyclDevice
fromValueError
todpctl.SyclDeviceCreationError
#826. - Improved performance of pybind11 type casters #837.
- Changed implementation of
dpctl.SyclProgram
from using deprecatedsycl::program
tosycl::kernel_bundle
#845. - Removed deprecated device aspects, added new supported aspects #844.
- Updated vendored
dlpack.h
to version 0.7 #847.
- Fixed
dpctl.lsplatform()
to work correctly when used from within Jupyter notebook #800. - Fixed script to drive debug build #835 and fixed code to compile in debug mode #836.
- Fixed filter selector string produced in outputs of
dpctl.lsplatform(verbosity=2)
anddpctl.SyclDevice.print_device_info
#866. - Fixed issue with slicing reported in gh-870 in #871.
- Properties added to MemoryUSM* objects. #647
- Added
dpctl.tensor.asarray
#646 - Implemented DLPack support for usm_ndarray #682
- Exported
dpctl.tensor.Device
class #708 #718 - Added testing of examples in CI #722
- Added user manuals to dpctl documentation #712 #773
- Folder dpctl-capi/ renamed to libsyclinterface/ in sources and documentation. #666 #768
- Added workflow to publish rendered documentation on PRs #673 #753 #726
- Synchronization functions and USM allocation functions release GIL #736 #766
dpctl.SyclEvent
destructor is made non-blocking #751
- Fixed for issue in code of
dpctl.tensor.usm_ndarray.T
#653 - Fixed issue with
dpctl.tensor.reshape
's affect on contiguity flags of usm_ndarray #695 - Fixed handling of empty list by
dpctl.tensor.asarray
#694 - Fixed type inference with array of empty arrays in
dpctl.tensor.asarray
#697 - Fixed issue gh-698 with
dpctl.tensr.asarray
#709 - Fixed performance of item assignment from numpy array #724
DPCTLDeviceMgr_GetNumDevices
should not operate on rejected devices #737- Fixed issue gh-729 for
dpctl.tensor.reshape
applied to 0-element usm_ndarray #756 - Fixed issue gh-728 with
dpctl.tensor.astype
#757 - Fixed type in memory overlapping test #770
- Fixed issue with operator.pos for
dpctl.tensor.usm_ndarray
#783 - Only call
PyThread_Ensure
from host_task if the main-thread interpreter is initialized and not finalizing #776 #778 #721
Full Changelog: https://github.com/IntelPython/dpctl/compare/0.11.4...0.12.0
- Fix tests for nested context factories expecting for integration environment by @PokhodenkoSA in #705
- Set the last byte in allocated char array to zero [cherry picked from #650] #699
- Extending
dpctl.device_context
with nested contexts #678
- Fixed issue #649 about incorrect behavior of
.T
method on sliced arrays #653
- Replaced uses of clang compiler with icx executable #665
- Use Python 3.9 in public CI #599
- Add a new C API utility function (
DPCTLDeviceMgr_GetDeviceInfoStr
) to return the device info as a C string object #620 - New Github workflow to build dpclt with nightly Intel llvm/sycl + drivers #621
- Always raise SubDeviceCreationError even when sub-device counts are zero #622
- Updated OpenCL interoprability code to fix build with Intel llvm/sycl bundle #625
- Enabled use of default platform context extension in SYCL compilers that implement this extension #627
- Implemented
dpctl.utils.get_execution_queue(queue_seq)
utility to help implementing "compute-follows data" convention for offload target #632 #631
- Replaced
host_device
device type withhost
in tests #616 - Rework the logic in
dpctl.memory
'scopy_from_device
method to work correctly withhost
device #618 - Use
dpctl.device_type.host
instead ofdpctl.device_type.host_device
#626 - Reinstate deprecated
sycl::program
and that was conditionally removed from open source DPC++ toolchain #633 - Use
LoadLibraryExA
instead ofLoadLibraryA
to mitigate a possible DLL injection issue when we load the Level zero DLL on windows #636 - Github coverage workflow is changed to use oneAPI 2021.3 instead of latest to work around broken profiling instrumentation in DPC++ 2021.4 #614
- Update build dependencies for NumPy #641
- Use "readelf" on SYCL's
pi_level_zero
library to find out and use the exact name ofze_loader.so
in SyclInterface library #617
- Removed use of DPC++ features deprecated in 2021.4 and open source Intel llvm/sycl compiler #603
- Suppress errant CMake log #610
- Fixes to compile dpctl using Intel llvm/sycl compiler #603
- Fix for the hang is to avoid passing
nullptr
argument tosycl::queue::prefetch
#612 - Fixed the logic to return device count #623
- Enabled building of C extensions with dpctl by including header defining
bool
type for C compilers #604
- Added methods bool, float, int, index, and complex to usm_ndarray #578
- Added data-API required special methods to usm_ndarray class, as well as to_numpy/from_numpy, astype, reshape functions #586
- Added methods to query dpctl.SyclDevice for size of global/local memory #589
- Added tests for constructors with invalid capsules #577
- Improved test coverage of
dpctl.SyclQueue
implementation #574 - Added a test to exercise API exported function (get_event_ref). #570
- Expanded tests in test_sycl_context to improve coverage #571
- Tweaks to test_sycl_event to improve coverage #567
- Improved coverage of dpctl.init file and other service functions #563
- Added test for repr and test for default argument to constructor #565
- Added some tests to involve capsule #564
- Added workflow for Public CI on Windows #534
- DPCTLQueue_Memcpy, _Prefetch, _Memadvise become asynchronous #557
- Added device aspect selector,
dpctl.select_device_with_aspects
#558 - Added test based on example from #583
- Parametrized tests for executing OpenCL kernels compiled from source in types of arguments #581
- Temporary disabled self-hosted CI jobs runner #559
- Changed static method
SyclQueue._create_from_context_and_device
#579 - Transitioned all Python API to use pytest over unittest, improved coverage in dpctl/memory #575
- Changed
dpctl.SyclEvent.profiling_info_submit
from method to a property #573 - Simplified arg parsing in SyclDevice constructor #572
- Used tag with alignment attribute set in README #562
- Moved sycl timer into dpctl.SyclTimer #555
- Used clang-format off, clang-format on to avoid include reordering in pybind11 example #588
- Implemented a workaround for running conda-build using Klocwork #566
- Separated pipelines for Linux and Windows #582
- Fixed inconsistency in
__sycl_usm_array_interface__
ofusm_ndarray
instance #584 - Fixed memory leak: Capsule deleters now free resources for renamed capsules too #568
- Fixed version test to allow for semantic versioning #569
- Improved coverage of _types.pxi #556
- Fixed
UnboundLocalError
when default queue could not be created #554
- Improvements to logic for working with custom DPC++ toolchain #481
- Add SyclContext unit test cases #488
- Consolidate configurations of tools that support PEP 518 into pyproject.toml #486
- Added C-API hash function, used them in Python interface #491
- Add missing extra checks to ensure unwrapped pointer is not Null
- Add error messages to L0 program creation routine
- Improve test coverage for dpctl_sycl_queue_interface #492
- Use pytest.warns in test_lsplatform3 #495
- Added test class to test DRef=nullptr case #496
- Extend parameterized test in test_sycl_queue_interface #497
- Use Memcpy, memadvise in tests
- Expanded types tests by TestQueueSubmitRange
- Added a test that retrieved DPCPP compiled kernel and submits them via DPCTLQueue_SubmitRange #499 , DPCTLEvent_GetCommandExecutionStatus #516, , DPCTLEvent_GetWaitList #510 functions
- Propagate compile flags #512
- Add conda package CI pipeline on GitHub Actions #515
- Run tests on GPU #518
- Add 3 wrapper func for event::get_profiling_info #519
- Changes to build_backend.py to enable sycl-compiler-prefix on Windows
- dtype keyword of usm_ndarray now supports np.double and other types #526
- Implemented DPCTLQueue_SubmitBarrier, DPCTLQueue_SubmitBarrierForEvents, SyclQueue.submit_barrier #524
- Added C-API DPCTLQueue_HasEnableProfiling
- Added Python API SyclQueue.has_enable_profiling
- Use public for data owning class definitions
- Queue has enable profiling #531
- Use public for data owning class definitions #533
- Added logic to verify that all bits of property integer were recognized and used #494
- Added support for some properties/methods of underluing device
- A test for properties, method of q mirroring that of device
- Conda build scripts should build wheels in the same setup invocation as install #538
- Added install_requires keyword to setup call
- Added requirements.txt files in dpctl/ and in dpctl/docs #540
- Improved C-API for dpctl Cython classes, added example of using them in Pybind11 extension. #550
- dpctl.SyclEvent acquired ability to get command status and get profiling information. #553
- Moved DPCLSyclInterface library from MANIFEST.in #482
- Refactored tests
- Use dpcpp compiler package for Linux #514
- Update conda-package.yml
- Static methods _init_helper made into functions and removed from PXD files #532
- Remove imports from future #485
- Fix sub devices #479
- Fix addressof_ref function in
SyclContext
#488 - Follow
DPCTLDevice_CreateFromSelector
which passes the check #487 - Fix a typo in the pytest configuration #490
- Fixed dbg_build.sh script for Linux to use L0
- Reuse IntelSycl_LIBRARY_DIR variable in cmake
- CXX, dpcpp used on Windows too
- Update conda-recipe/bld.bat
- Change to SyclQueue.repr to reflect properties #531
- Static methods
_init_helper
made into functions and removed from PXD files #532 - Fixed typo in pip installation instruction #536
- Fixed dpctl_config.h, added dpctl_service.h, .cpp #539
- Fixed
__sycl_usm_array_interface__
output for 0d arrays #547
- Implemented support for constructing MemoryUSM* from object with sycl_usm_array_interface when array-info is not contiguous #400
- Print the backend as part of SyclDevice.print_device_info function #409
- Added dpctl/tensor/_usmarray submodule #427
- Added arg checking to functions in dpctl_sycl_usm_interface.cpp #430
- A static method of _Memory to create from external allocation #430
- Added usm_ndarray accessors #435
- Added Device class representing Data-API notion of device #440
- Added free Python function as_usm_memory(obj) #443 and associated unit tests #449
- Dependency for numpy 1.17 #445
- Add a flag to make doxygen HTML generation optional #450
- Added a feature to get the filter string for a device from Python using the new dpctl.SyclDevice.get_filter_string method. Also added the corresponding DPCTLDeviceMgr_GetPositionInDevices(DRef, device_mask) C API function #453
- New options to setup.py to specify which dpcpp compiler to use, if L0 program creation is to be supported, and to generate code coverage #426
- Github action to check Python code quality #422
- Github action to auto-publish Sphinx docs for master #446
- Github action to generate coverage report and publish to coveralls.io #459
- Rename dpctl.dptensor to dpctl.tensor #407
- Changed repr for Memory objects #442
- Used dpctl.SyclQueue instead of manager and get current queue in tests for SyclProgram #448
- Issue #189 dpctl.memory.MemoryUSMShared(np.int64(16)) should work #392
- Use size_t instead of Py_ssize_t to fit device USM pointer #405
- Various code quality issues identified by flake8 (#417, #419, #420, #422)
- Fixed issues in slicing and array construction #441
- Fixed an issue #447 where dpctl.get_devices does not return devices in the same order as sycl::device::get_devices #451
- L0 program creation support on Windows #319
- Removing public keyword to get_current_queue Cython declaration #437
- Complete support for
sycl::ONEAPI::filter_selector
in dpctl. , andsycl::platform
#298 creation using opaque pointers. - A
DPCTLDeviceMgr
module in C API that caches a default context for root devices #277. DPCTLSyclBackendType
andDPCTLSyclDeviceType
have a new memberALL
#287.- C API now provides helper functions to convert between dpctl and SYCL enum values #296.
- Macros to help create opaque vector classes for opaque SYCL types #297.
,
SyclContext
#334,SyclPlatform
(#336, #298),SyclQueue
#323 have constructors that recognize filter selectors and closely follow DPC++ interface. - Add API to get a
PyCapsule
fromSyclQueue
,SyclContext
instances #350. - Added
get_queue_ref_from_ptr_and_syclobj(ptr, syclobj)
that createsDPCTLSyclQueueRef
from a USM pointer and Python objectsyclobj
from__sycl_usm_array_interface__
#380. - Support for SYCL sub-devices, including sub-device creation, queue, and context creation using sub-devices #343.
SyclDevice.parent_device
property to indicate if an instance has a parent device #366.- Several new getter functions for device info descriptors to device interface (#300, #335, #318, #315, #308).
- Support for SYCL device aspects #307.
- Properties for every
sycl::device
info and aspect that we support inSyclDevice
#324. - Support handling async errors inside
SylQueue
instances #346. get_backend
,get_platform
,get_device_type
to PythonSyclDevice
class #300- A
_sycl_device_factory.pyx
module providingSyclDevice
constructors using standardsycl::device_selector
classes (previously in_sycl_device.pyx
) and a newget_devices
#277 function to enumerate all devices. _sycl_device_factory.pyx
implementsget_num_devices
andhas_*_device(s)
functions #320.- Enable Python coverage in CI for Linux #369.
- Use
public
keyword in_sycl_*.pxd
to generate header files allowing non-Cython centric native extensions to work with dpctl's Python objects #218. - Documentation improvements #341.
- Rename dpCtl to dpctl in all comments, license headers, and docs. #342
dpctl.memory.MemoryUSM*
constructors now usedpctl.SyclQueue()
instead ofdpctl.get_current_queue()
when thequeue
keyword argument isNone
(default) #382.dpctl.set_default_queue
has been renamed todpctl.set_global_queue()
#323.- Changed
dpctl.dump
todpctl.lsplatform
#336. - Various
SyclDevice
methods related to queryingsycl::info::device
were converted to properties #324. - Various C API functions names were changed.
- Possible crashes when a SYCL platform is not available #349.
- Fix tests which fail if GPU is not available (only CPU is available) #359.
- Fix breaking C API tests #358.
- Bandit warning about "subprocess.check_call(shell=True)" for Windows #306.
- Removed
get_num_platforms
,has_cpu_queues
,has_gpu_queues
,get_num_queues
,has_sycl_platforms
#320.
- Do not use POP_FRONT in FindDPCPP.cmake so that we can use a cmake version older that 3.15.
- Documentation improvements.
- Cmake improvements and Coverage for C API, Cython and Python.
- Added support for Level Zero devices and queues.
- Added support for SYCL standard device_selector classes.
- SyclDevice instances can now be constructed using filter selector strings.
- Code of conduct.
- Building wheels.
- Queue manager improvements.
- Adding
__array_function__
so that Numpy calls with dparrays work. - Using clang-format for C/C++ code formatting.
- Using pytest for running tests.
- Add python and cython file coverage.
- Using Bandit for finding common security issues in Python code.
- Add instructions about file headers formats.
- Changed compiler name usage from clang++ to dpcpp.
- Reformat backend.pxd to be closer to black style.
- Remove
cython
frominstall_requires
. It allows usedpCtl
innumba
extensions. - Incorrect import in example.
- Consistency of file headers.
- Klocwork issues.
_Memory.get_pointer_type
static method which returns kind of USM pointer.- Utility functions to transform string to device type and back.
- New
dpctl.dptensor.numpy_usm_shared
module containing USM array. USM array extends NumPy ndarray. - A lot of new examples. Including examples of building Cython extensions with DPC++ compiler that interoperate with dpCtl.
- Mechanism for registering a callback function to look and see if the object supports USM.
- setup.py builds C++ backend for develop and install commands.
- Building wheels.
- Use DPC++ runtime from package
dpcpp_cpp_rt
. - All usage of
DPPL
in C-API functions was changed toDPCTL
, e.g.,DPPLQueueMgr_GetCurrentQueue
toDPCTLQueueMgr_GetCurrentQueue
. - Renamed the C-API directory is now called
dpctl-capi
instead ofbackends
. - Refactoring the
dpctl-capi
functions to prepare for changes to add Level Zero program creation. SyclProgram
andSyclKernel
classes were moved out ofdpctl
into thedpctl.program
sub-module.
- Klockwork static code analysis warnings.
- Device descriptors "max_compute_units", "max_work_item_dimensions", "max_work_item_sizes", "max_work_group_size", "max_num_sub_groups" and "aspects" for int64 atomics inside dpctl C API and inside the dpctl.SyclDevice class.
- MemoryUSM* classes moved to
dpctl.memory
module, added support for aligned allocation, added support forprefetch
andmem_advise
(sychronous) methods, implementedcopy_to_host
,copy_from_host
andcopy_from_device
methods, pickling support, and zero-copy interoperability with Python objects which implement__sycl_usm_array_inerface__
protocol. - Helper scripts to generate API documentation for both C API and Python.
- Compiler warnings when building libDPPLSyclInterface and the Cython extensions.
- The Legacy OpenCL interface.
- How the initial active queue is populated inside DPPLQueueMgr.
- dpctl.SyclQueueManager only reports the number of non-host platform.
- dpctl.SyclQueueManager now raises an exception if DPCTL C API returns a nullptr instead of a valid Sycl queue.
- Several crashes in cases where an OpenCL or Level Zero platform is not available.
- Fix failing platform test case. #116
- Properly skip tests when no OpenCL devices are available.
- Add skip tests to test_sycl_usm.py
- Fix Gtests configuration.
- A crash on Windows due a Level Zero driver problem. Each device was getting enumerated twice. To handle the issue, we added a temporary fix to use only first device for each device type and backend #118.
- Changelog was added for dpctl.
- Windows build was fixed.
- Add a helper function to all Python SyclXXX classes to get the address of the base C API pointer as a long.
- Rename PyDPPL to dpCtl in comments (function name renaming to come later)
- Fix bugs highlighted by tools.
- Various code clean ups.
- Dump functions were enhanced to print back-end information.
- dpctl gained support for unint_8 and unsigned long data types.
- oneAPI Beta 10 tool chain support was added.
- dpctl is now aware of DPC++ Sycl PI back-ends. The functionality is now exposed via the context interface.
- C API's queue manager was refactored to require back-end.
- dpct's device_context now requires back-end, device-type, and device-id to be provided in a string format, e.g. opencl:gpu:0.
- Fixed some important bugs found by static analysis.
- Add dpctl.get_curent_device_type().
- Set _cpu_device and _gpu_device to None by default.
- Add get include and include headers.
- DPPL shared objects are installed into dpctl.
- Refactor unit tests.
- Adds C and Cython API for portions of Sycl queue, device, context interfaces.
- Implementing USM memory management.
- Refactored API to expose a minimal sycl::queue interface.
- Modify cpu_queues, gpu_queues and active_queues to functions.
- Change static vectors to static pointers to verctors. It disables call for destructors. Destructors are also call in undefined order.
- Rename package PyDPPL to dpCtl.
- Use dpcpp.exe on Windows instead of dpcpp-cl.exe deleted in oneAPI beta08.
- Correct use ERRORLEVEL in conda scripts for Windows.
- Fix using dppl.has_sycl_platforms() and dppl.has_gpu_queues() functions in skipIf