Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPU UMD driver cause oneAPI sycl runtime crash #29

Open
wistal opened this issue May 15, 2024 · 2 comments
Open

NPU UMD driver cause oneAPI sycl runtime crash #29

wistal opened this issue May 15, 2024 · 2 comments

Comments

@wistal
Copy link

wistal commented May 15, 2024

Without NPU UMD driver, oneAPI works OK,
After install NPU UMD driver, oneAPI crash.

Test system Ubuntu 22.04 on MTL 165H
Confirmed NPU KMD driver loaded OK

[ 2.150613] intel_vpu 0000:00:0b.0: enabling device (0000 -> 0002)
[ 2.166532] intel_vpu 0000:00:0b.0: [drm] Firmware: intel/vpu/vpu_37xx_v0.0.bin, version: 20240221MTL_CLIENT_SILICON-release2101ci_tag_ud202408_vpu_rc_20240221_2101845c105994a
[ 2.290232] [drm] Initialized intel_vpu 1.0.0 20230117 for 0000:00:0b.0 on minor 0

===============================
Installed oneAPI 2024.0 with iGPU driver, without NPU UMD driver

. intel/oneapi/setvars.sh

:: initializing oneAPI environment ...
-bash: BASH_VERSION = 5.1.16(1)-release
args: Using "$@" for setvars.sh arguments:
:: advisor -- latest
:: ccl -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: ipp -- latest
:: ippcp -- latest
:: mkl -- latest
:: mpi -- latest
:: tbb -- latest
:: vtune -- latest
:: oneAPI environment initialized ::

~$ sycl-ls

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 7 165H OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) Graphics OpenCL 3.0 NEO [24.13.29138.7]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) Graphics 1.3 [1.3.29138]


Everything OK

After install NPU UMD driver:

~/npu/1.20$ ls
intel-driver-compiler-npu_1.2.0.20240404-8553879914_ubuntu22.04_amd64.deb intel-fw-npu_1.2.0.20240404-8553879914_ubuntu22.04_amd64.deb intel-level-zero-npu_1.2.0.20240404-8553879914_ubuntu22.04_amd64.deb npu.sh
~/npu/1.20$ sudo dpkg -i *.deb
Selecting previously unselected package intel-driver-compiler-npu.
(Reading database ... 209200 files and directories currently installed.)
Preparing to unpack intel-driver-compiler-npu_1.2.0.20240404-8553879914_ubuntu22.04_amd64.deb ...
Unpacking intel-driver-compiler-npu (1.2.0.20240404-8553879914) ...
Selecting previously unselected package intel-fw-npu.
Preparing to unpack intel-fw-npu_1.2.0.20240404-8553879914_ubuntu22.04_amd64.deb ...
Unpacking intel-fw-npu (1.2.0.20240404-8553879914) ...
Selecting previously unselected package intel-level-zero-npu.
Preparing to unpack intel-level-zero-npu_1.2.0.20240404-8553879914_ubuntu22.04_amd64.deb ...
Unpacking intel-level-zero-npu (1.2.0.20240404-8553879914) ...
Setting up intel-driver-compiler-npu (1.2.0.20240404-8553879914) ...
Setting up intel-fw-npu (1.2.0.20240404-8553879914) ...
Setting up intel-level-zero-npu (1.2.0.20240404-8553879914) ...
Processing triggers for libc-bin (2.35-0ubuntu3.1)


Check oneAPI sycl device again

~$ sycl-ls
SYCL Exception encountered: Native API failed. Native API returns: -30 (PI_ERROR_INVALID_VALUE) -30 (PI_ERROR_INVALID_VALUE)


Error happened! NPU driver caused it crash

check with strace log:
~$ strace sycl-ls

... skipped logs...
futex(0x746c17666178, FUTEX_WAKE_PRIVATE, 2147483647) = 0
openat(AT_FDCWD, "/dev/accel/accel0", O_RDWR|O_CLOEXEC) = 4
newfstatat(4, "", {st_mode=S_IFCHR|0660, st_rdev=makedev(0x105, 0), ...}, AT_EMPTY_PATH) = 0
ioctl(4, DRM_IOCTL_VERSION, 0x7ffc772356c0) = 0
ioctl(4, DRM_IOCTL_VERSION, 0x7ffc772356c0) = 0
ioctl(4, DRM_IOCTL_ETNAVIV_GET_PARAM or DRM_IOCTL_EXYNOS_GEM_CREATE or DRM_IOCTL_LIMA_GET_PARAM or DRM_IOCTL_MSM_GET_PARAM or DRM_IOCTL_OMAP_GET_PARAM or DRM_IOCTL_TEGRA_GEM_CREATE, 0x7ffc772355a0) = 0
ioctl(4, DRM_IOCTL_ETNAVIV_GET_PARAM or DRM_IOCTL_EXYNOS_GEM_CREATE or DRM_IOCTL_LIMA_GET_PARAM or DRM_IOCTL_MSM_GET_PARAM or
...skipped logs...
DRM_IOCTL_OMAP_GET_PARAM or DRM_IOCTL_TEGRA_GEM_CREATE, 0x7ffc77235560) = 0
ioctl(4, DRM_IOCTL_ETNAVIV_GET_PARAM or DRM_IOCTL_EXYNOS_GEM_CREATE or DRM_IOCTL_LIMA_GET_PARAM or DRM_IOCTL_MSM_GET_PARAM or DRM_IOCTL_OMAP_GET_PARAM or DRM_IOCTL_TEGRA_GEM_CREATE, 0x7ffc77235560) = 0
close(4) = 0
openat(AT_FDCWD, "/dev/accel/accel1", O_RDWR|O_CLOEXEC) = -1 ENOENT (No such file or directory)
... skipped logs ...
openat(AT_FDCWD, "/dev/accel/accel61", O_RDWR|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/dev/accel/accel62", O_RDWR|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/dev/accel/accel63", O_RDWR|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/user/intel/oneapi/tbb/2021.11/env/../lib/intel64/gcc4.8/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/user/intel/oneapi/mpi/2021.11/opt/mpi/libfabric/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/user/intel/oneapi/mpi/2021.11/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/user/intel/oneapi/mkl/2024.0/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/user/intel/oneapi/ippcp/2021.9/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/user/intel/oneapi/ipp/2021.10/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/user/intel/oneapi/dpl/2022.3/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/user/intel/oneapi/dnnl/2024.0/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/user/intel/oneapi/debugger/2024.0/opt/debugger/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/user/intel/oneapi/dal/2024.0/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/user/intel/oneapi/compiler/2024.0/opt/oclfpga/host/linux64/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/user/intel/oneapi/compiler/2024.0/opt/compiler/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/user/intel/oneapi/compiler/2024.0/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/user/intel/oneapi/ccl/2021.11/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 4
newfstatat(4, "", {st_mode=S_IFREG|0644, st_size=56923, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 56923, PROT_READ, MAP_PRIVATE, 4, 0) = 0x746c34043000
close(4) = 0
openat(AT_FDCWD, "/lib/libvpux_driver_compiler.so", O_RDONLY|O_CLOEXEC) = 4
read(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(4, "", {st_mode=S_IFREG|0644, st_size=72440896, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 73455568, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x746bf7800000
mprotect(0x746bf7a59000, 69115904, PROT_NONE) = 0
mmap(0x746bf7a59000, 51245056, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x259000) = 0x746bf7a59000
mmap(0x746bfab38000, 17866752, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x3338000) = 0x746bfab38000
mmap(0x746bfbc43000, 868352, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x4442000) = 0x746bfbc43000
mmap(0x746bfbd17000, 1009616, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x746bfbd17000
close(4) = 0
mprotect(0x746bfbc43000, 815104, PROT_READ) = 0
brk(0x5b437800a000) = 0x5b437800a000
brk(0x5b437802b000) = 0x5b437802b000

...skipped logs...

futex(0x5b43789d5b90, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x746c34f73210, FUTEX_WAKE_PRIVATE, 2147483647) = 0
write(2, "SYCL Exception encountered: ", 28SYCL Exception encountered: ) = 28
write(2, "Native API failed. Native API re"..., 96Native API failed. Native API returns: -30 (PI_ERROR_INVALID_VALUE) -30 (PI_ERROR_INVALID_VALUE)) = 96

=====================================================
Check with OpenVINO:
~$ benchmark_app -h | grep Avail
Available target devices: CPU GPU NPU
NPU can works with OpenVINO normally

@kpradzyn
Copy link
Contributor

Hey @wistal

This is known issue with L0 API design. In general, to fix this we need:

  1. L0 API change. Here is proposal: Add a new single API for Level Zero Init and Driver Retrieval oneapi-src/level-zero-spec#298 (still open status)
  2. sycl-ls update to sync upper L0 API changes.

@wistal
Copy link
Author

wistal commented May 29, 2024

Thanks for update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants