Documentation for rocThrust available at https://rocm.docs.amd.com/projects/rocThrust/en/latest/.
- Added extended tests to
rtest.py
. These tests are extra tests that did not fit the criteria of smoke and regression tests. These tests will take much longer to run relative to smoke and regression tests. Usepython rtest.py [--emulation|-e|--test|-t]=extended
to run these tests. - Added regression tests to
rtest.py
. These tests recreate scenarios that have caused hardware problems in past emulation environments. Usepython rtest.py [--emulation|-e|--test|-t]=regression
to run these tests. - Added smoke test options, which runs a subset of the unit tests and ensures that less than 2gb of VRAM will be used. Use
python rtest.py [--emulation|-e|--test|-t]=smoke
to run these tests. - Added
--emulation
option forrtest.py
- Merged changes from upstream CCCL/thrust 2.4.0
- Merged changes from upstream CCCL/thrust 2.5.0
- Added
find_first_of
to HIPSTDPAR - Added
search
andfind_end
to HIPSTDPAR - Added
search_n
to HIPSTDPAR - Updated HIPSTDPAR's
adjacent_find
to use rocPRIM's implementation
- Changed the C++ version from 14 to 17. C++14 will be deprecated in the next major release.
--test|-t
is no longer a required flag forrtest.py
. Instead, the user can use either--emulation|-e
or--test|-t
, but not both.- Split the contents of HIPSTDPAR's forwarding header into several implementation headers.
- Fixed
copy_if
to work with large data types (512 bytes)
- Merged changes from upstream CCCL/thrust 2.3.2
- Only the NVIDIA backend uses
tuple
andpair
types from libcu++, other backends continue to use the original Thrust implementations and hence do not require libcu++ (CCCL) as a dependency.
- Only the NVIDIA backend uses
- Added the
thrust::hip::par_det
execution policy to enable bitwise reproducibility on algorithms that are not bitwise reproducible by default.
- Updated the default value for the
-a
argument fromrmake.py
togfx906:xnack-,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201
. - Enabled the upstream (thrust) test suite for execution by default. It can still be disabled by CMake option
-DENABLE_UPSTREAM_TESTS=OFF
.
- Fixed an issue in
rmake.py
where the list storing cmake options would contain individual characters instead of a full string of options. - Fixed the HIP backend not passing
TestCopyIfNonTrivial
from the upstream (thrust) test suite. - Fixed tests failing when compiled with
-D_GLIBCXX_ASSERTIONS=ON
.
- Merged changes from upstream CCCL/thrust 2.2.0
- Updated the contents of
system/hip
andtest
with the upstream changes tosystem/cuda
andtesting
- Updated the contents of
- Added HIPSTDPAR library as part of rocThrust.
- Updated internal calls to
rocprim::detail::invoke_result
to use the public APIrocprim::invoke_result
. - Use
rocprim::device_adjacent_difference
foradjacent_difference
API call. - Updated internal use of custom iterator in
thrust::detail::unique_by_key
to use rocPRIM'srocprim::unique_by_key
. - Updated
adjecent_difference
to make use ofrocprim:adjecent_difference
when iterators are comparable and not equal otherwise userocprim:adjacent_difference_inplace
.
- Fixed incorrect implementation of
thrust::optional<T&>::emplace()
.
thrust::reduce_by_key
outputs are not bit-wise reproducible, as run-to-run results for pseudo-associative reduction operators (e.g. floating-point arithmetic operators) are not deterministic on the same device.- Note that currently, rocThrust memory allocation is performed in such a way that most algorithmic API functions cannot be called from within hipGraphs.
- Updated to match upstream Thrust 2.0.1
- NV_IF_TARGET macro from libcu++ for NVIDIA backend and HIP implementation for HIP backend.
- The cmake build system now additionally accepts
GPU_TARGETS
in addition toAMDGPU_TARGETS
for setting the targeted gpu architectures.GPU_TARGETS=all
will compile for all supported architectures.AMDGPU_TARGETS
is only provided for backwards compatibility,GPU_TARGETS
should be preferred. - Removed cub symlink from the root of the repository.
- Removed support for deprecated macros (THRUST_DEVICE_BACKEND and THRUST_HOST_BACKEND).
- Fixed a segmentation fault when binary search / upper bound / lower bound / equal range was invoked with
hip_rocprim::execute_on_stream_base
policy.
- The
THRUST_HAS_CUDART
macro, which is no longer used in Thrust (it's provided only for legacy support) is replaced withNV_IF_TARGET
andTHRUST_RDC_ENABLED
in the NVIDIA backend. The HIP backend doesn't have aTHRUST_RDC_ENABLED
macro, so some branches in Thrust code may be unreachable in the HIP backend.
lower_bound
,upper_bound
, andbinary_search
failed to compile for certain types.- Fixed issue where
transform_iterator
would not compile with__device__
-only operators.
- Updated
docs
directory structure to match the standard of rocm-docs-core. - Removed references to and workarounds for deprecated hcc
- Updates to match upstream Thrust 1.17.2
partition_copy
now usesrocprim::partition_two_way
for increased performance
set_difference
andset_intersection
no longer hang if the number of items is aboveUINT_MAX
(the unit tests forset_difference
andset_intersection
used to fail theTestSetDifferenceWithBigIndexes
)
- Updates to match upstream Thrust 1.16.0
- rocThrust functionality dependent on device malloc is functional (ROCm 5.2 reenabled device malloc); you can now use device launched
thrust::sort
andthrust::sort_by_key
- Packages for tests and benchmark executables on all supported operating systems using CPack
async_copy
,partition
, andstable_sort_by_key
unit tests are failing for HIP on Windows
- Updates to match upstream Thrust 1.15.0
async_copy
,partition
, andstable_sort_by_key
unit tests are failing for HIP on Windows
- Updates to match upstream Thrust 1.13.0
- Updates to match upstream Thrust 1.14.0
- Added async scan
- Scan algorithms:
inclusive_scan
now uses theinput-type
asaccumulator-type
;exclusive_scan
usesinitial-value-type
- This changes the behavior of small-size input types with large-size output types (e.g.
short
input,int
output) and low-res input with high-res output (e.g.float
input,double
output)
- This changes the behavior of small-size input types with large-size output types (e.g.
- Initial HIP on Windows support
- Packaging has changed to a development package (called
rocthrust-dev
for.deb
packages androcthrust-devel
for.rpm
packages). Because rocThrust is a header-only library, there is no runtime package. To aid in the transition, the development package sets theprovides
field torocthrust
, so that existing packages that are dependent on rocThrust can continue to work. Thisprovides
feature is introduced as a deprecated feature because it will be removed in a future ROCm release.
async_copy
,partition
, andstable_sort_by_key
unit tests are failing for HIP on Windows- Mixed-type exclusive scan algorithm is not using the initial value type for the results type
- gfx1030 support
- AddressSanitizer build option
- async_transform unit test failure
- Updates to match upstream Thrust 1.11
- gfx90a support
- gfx803 support re-enabled
- Updates to match upstream Thrust 1.10
- rocThrust now requires CMake version 3.10.2 or greater
- Size zero inputs are now properly handled with newer ROCm builds, which no longer allow zero-size kernel grid/block dimensions
- Warning of unused results
- There are no changes with this release
- Updated to upstream Thrust 1.10.0
- Implemented runtime error for unsupported algorithms and disabled respective tests
- Updated CMake to use downloaded rocPRIM
copy_if
on device test case
- We've disabled ROCm support for device malloc. As a result, rocThrust functionality dependent on
device malloc does not work--avoid using device launched
thrust::sort
andthrust::sort_by_key
. Note that Host launched functionality is not impacted.- A partial enablement of device malloc is possible by setting
HIP_ENABLE_DEVICE_MALLOC
to 1. thrust::sort
andthrust::sort_by_key
may work on certain input sizes but we don't recommended this for production code.
- A partial enablement of device malloc is possible by setting
- Updated to upstream Thrust 1.9.8
- New test cases for device-side algorithms
- Bug for binary search
- Implemented workarounds for
hipStreamDefault
hang
- We've disabled ROCm support for device malloc. As a result, rocThrust functionality dependent on
device malloc does not work--avoid using device launched
thrust::sort
andthrust::sort_by_key
. Note that Host launched functionality is not impacted.- A partial enablement of device malloc is possible by setting
HIP_ENABLE_DEVICE_MALLOC
to 1. thrust::sort
andthrust::sort_by_key
may work on certain input sizes but we don't recommended this for production code.
- A partial enablement of device malloc is possible by setting
- We've disabled ROCm support for device malloc. As a result, rocThrust functionality dependent on
device malloc does not work--avoid using device launched
thrust::sort
andthrust::sort_by_key
. Note that Host launched functionality is not impacted.- A partial enablement of device malloc is possible by setting
HIP_ENABLE_DEVICE_MALLOC
to 1. thrust::sort
andthrust::sort_by_key
may work on certain input sizes but we don't recommended this for production code.
- A partial enablement of device malloc is possible by setting
- Updated to upstream Thrust 1.9.4
- Package dependency has changed to rocPRIM only
- We've disabled ROCm support for device malloc. As a result, rocThrust functionality dependent on
device malloc does not work--avoid using device launched
thrust::sort
andthrust::sort_by_key
. Note that Host launched functionality is not impacted.- A partial enablement of device malloc is possible by setting
HIP_ENABLE_DEVICE_MALLOC
to 1. thrust::sort
andthrust::sort_by_key
may work on certain input sizes but we don't recommended this for production code.
- A partial enablement of device malloc is possible by setting
- We've disabled ROCm support for device malloc. As a result, rocThrust functionality dependent on
device malloc does not work--avoid using device launched
thrust::sort
andthrust::sort_by_key
. Note that Host launched functionality is not impacted.- A partial enablement of device malloc is possible by setting
HIP_ENABLE_DEVICE_MALLOC
to 1. thrust::sort
andthrust::sort_by_key
may work on certain input sizes but we don't recommended this for production code.
- A partial enablement of device malloc is possible by setting
- Improved tests with fixed and random seeds for test data
- CMake searches for rocThrust locally first; if it isn't found, CMake downloads it from GitHub
- HCC build has been deprecated