-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transitioned dpctl to require DPC++ 2023, removed host related functions, fixed linker crash, enabled SyclKernel.max_sub_group_size property #1028
Conversation
The issue manifested itself as "Relocation trucated to fit" error, and was caused by the large size of device code produced in debug build. The suggested solution is to use link option `-fsycl-link-huge-device-code`. See https://github.com/intel/llvm/blob/sycl/sycl/doc/UsersManual.md#link-options ``` FAILED: dpctl/tensor/_tensor_impl.cpython-39-x86_64-linux-gnu.so : && /opt/intel/oneapi/compiler/2023.0.0/linux/bin/icpx -fPIC -fsycl -O3 -Wall -Wextra -Winit-self -Wunused-function -Wuninitialized -Wmissing-declarations -fdiagnostics-color=auto -fstack-protector -fstack-protector-all -fpic -fPIC -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -fno-strict-overflow -fno-delete-null-pointer-checks -fsycl -g -Wall -Wextra -Winit-self -Wunused-function -Wuninitialized -Wmissing-declarations -fdiagnostics-color=auto -fstack-protector -fstack-protector-all -fpic -fPIC -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -fno-strict-overflow -fno-delete-null-pointer-checks -fsycl -O0 -ggdb3 -DDEBUG -fsycl-device-code-split=per_kernel -shared -o dpctl/tensor/_tensor_impl.cpython-39-x86_64-linux-gnu.so dpctl/tensor/CMakeFiles/_tensor_impl.dir/libtensor/source/tensor_py.cpp.o dpctl/tensor/CMakeFiles/_tensor_impl.dir/libtensor/source/simplify_iteration_space.cpp.o dpctl/tensor/CMakeFiles/_tensor_impl.dir/libtensor/source/copy_and_cast_usm_to_usm.cpp.o dpctl/tensor/CMakeFiles/_tensor_impl.dir/libtensor/source/copy_numpy_ndarray_into_usm_ndarray.cpp.o dpctl/tensor/CMakeFiles/_tensor_impl.dir/libtensor/source/copy_for_reshape.cpp.o dpctl/tensor/CMakeFiles/_tensor_impl.dir/libtensor/source/linear_sequences.cpp.o dpctl/tensor/CMakeFiles/_tensor_impl.dir/libtensor/source/eye_ctor.cpp.o dpctl/tensor/CMakeFiles/_tensor_impl.dir/libtensor/source/full_ctor.cpp.o dpctl/tensor/CMakeFiles/_tensor_impl.dir/libtensor/source/triul_ctor.cpp.o dpctl/tensor/CMakeFiles/_tensor_impl.dir/libtensor/source/device_support_queries.cpp.o -Wl,-rpath,::::::: && : /lib/x86_64-linux-gnu/crti.o: in function `_init': (.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__' /tmp/icpx-a02203/_tensor_impl-411e70.o: in function `sycl.descriptor_reg': offload.wrapper.object:(.text.startup+0x4): relocation truncated to fit: R_X86_64_PC32 against `.data.rel.ro' /tmp/icpx-a02203/_tensor_impl-411e70.o: in function `sycl.descriptor_unreg': offload.wrapper.object:(.text.startup+0x14): relocation truncated to fit: R_X86_64_PC32 against `.data.rel.ro' /tmp/icpx-a02203/_tensor_impl-411e70.o:(.eh_frame+0x20): relocation truncated to fit: R_X86_64_PC32 against `.text.startup' /tmp/icpx-a02203/_tensor_impl-411e70.o:(.eh_frame+0x38): relocation truncated to fit: R_X86_64_PC32 against `.text.startup' /usr/lib/gcc/x86_64-linux-gnu/9/crtbeginS.o: in function `deregister_tm_clones': crtstuff.c:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table' crtstuff.c:(.text+0xa): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .data section in dpctl/tensor/_tensor_impl.cpython-39-x86_64-linux-gnu.so crtstuff.c:(.text+0x16): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `_ITM_deregisterTMCloneTable' /usr/lib/gcc/x86_64-linux-gnu/9/crtbeginS.o: in function `register_tm_clones': crtstuff.c:(.text+0x33): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table' crtstuff.c:(.text+0x3a): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .data section in dpctl/tensor/_tensor_impl.cpython-39-x86_64-linux-gnu.so crtstuff.c:(.text+0x57): additional relocation overflows omitted from the output dpctl/tensor/_tensor_impl.cpython-39-x86_64-linux-gnu.so: PC-relative offset overflow in PLT entry for `_ZSt10_ConstructIN4sycl3_V15eventEJEEvPT_DpOT0_' icpx: error: linker command failed with exit code 1 (use -v to see invocation) ```
@mdtoguchi @sergey-semenov I applied the suggested solution (learned in CMPLRLLVM-39897). Are there alternative ways to address the problem? Would splitting the library into several smaller SO files work as well? |
Array API standard conformance tests for dpctl=0.14.1dev0=py310h76be34b_47 ran successfully. |
@oleksandr-pavlyk, breaking up in to separate |
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1028/index.html |
DPC++ 2023 has introduced a regression where the aspect returns 1 even for devices without fp64 aspect.
… since host device has been removed from 2023 compiler
With 2022.2 a non-zero value was being returned for interoperability kernel compiler for GPU device, but it now returns zero with 2023.0 compiler.
Debug build was used previsouly to accelerate the build, but due to growth of device section, debug build takes longer than the release build.
7b3e89f
to
37a3fe7
Compare
Moved host operations done after kernel submission to improve chance of detecting non-complete status of kernel submission.
Host device has been removed from DPC++ compiler in 2023.0.0 Also removed support for host device type and for backend host.
Removed `DPCTLDevice_IsHost`, `DPCTLHostSelector_Create`, `DPCTLContext_IsHost` and use of is_host from Python API.
37a3fe7
to
9aa909f
Compare
Array API standard conformance tests for dpctl=0.14.1dev0=py310h76be34b_63 ran successfully. |
Array API standard conformance tests for dpctl=0.14.1dev0=py310h76be34b_68 ran successfully. |
b5f0f10
to
4b0f484
Compare
Array API standard conformance tests for dpctl=0.14.1dev0=py310h76be34b_68 ran successfully. |
Array API standard conformance tests for dpctl=0.14.1dev0=py310h76be34b_69 ran successfully. |
Array API standard conformance tests for dpctl=0.14.1dev0=py310h76be34b_71 ran successfully. |
ASSERT_TRUE(add_private_mem_sz >= 0); | ||
ASSERT_TRUE(axpy_private_mem_sz >= 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually an improvement in DPC++ RT, not a regression. Neither of the kernels use private memory.
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
Array API standard conformance tests for dpctl=0.14.1dev0=py310h76be34b_71 ran successfully. |
The crash manifested itself as "Relocation trucated to fit" error, and was caused by the large size of device code produced in debug build. The suggested solution is to use link option
-fsycl-link-huge-device-code
.See https://github.com/intel/llvm/blob/sycl/sycl/doc/UsersManual.md#link-options
Removed
DPCTLDevice_IsHost
,DPCTLContext_IsHost
,DPCTLHostSelector_Create
, and Python API:dpctl.select_host_device
,dpctl.has_host_device
,dpctl.SyclDevice.is_host
, anddpctl.SyclDevice.has_aspect_host
, as well ashost
backend andhost_device
device type.Also added support for enabled in DPC++ 2023
dpctl.SyclKernel.max_sub_group_size
.Removed use of
__SYCL_COMPILER_2023_SWITCHOVER
preprocessor constant and introduced__SYCL_COMPILER_VERSION_REQUIRED
constant inConfig/dpctl_config.h
. Introducedstatic_assert
inlibsyclinterface/source/*.cpp
files that#include <CL/sycl.hpp>
to ensure that the compiler meets the minimum required version.