Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Improved efficiency of pybind11 classes and type casters.
Type casters in pybind11 generated with
PYBIND11_TYPE_CASTER
macro rely on default constructor of C++ type. The default constructed value is then overwritten byload
method. C++ types such assycl::device
andsycl::queue
do non-trivial amount of work in default constructor. This work was ranking high in certain workloads (such asexample/pybind11/onemkl_gemv/sycl_timing_solver.py
).Same applies to auto-generated type-casters for classes
dpctl::memory::usm_memory
anddpctl::tensor::usm_ndarray
.This PR introduces DPCTL_TYPE_CASTER macro that defines
unique_ptr
to the C++ type, rather than the type itself. This avoids superfluous call to default constructor altogether.Instead of redefining
pyobject_caster
automatically generated fordpctl::tensor::usm_ndarray
anddpctl::memory::usm_memory
a singleton classdpctl::detail::dpctl_api
is added that creates objects used by default constructors of these classes.