RuntimeError: Error building extension 'slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0' #58

zhonglin-cdut · 2024-11-13T05:48:25Z

Traceback (most recent call last):
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\torch\utils\cpp_extension.py", line 2107, in _run_ninja_build
subprocess.run(
File "D:\Anaconda3\envs\DjPytorch\lib\subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\Admin\Desktop\pythonProject（配置GPU环境）\pythonProject1\xlstmtest.py", line 35, in
xlstm_stack = xLSTMBlockStack(cfg)
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\xlstm_block_stack.py", line 84, in init
self.blocks = self._create_blocks(config=config)
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\xlstm_block_stack.py", line 105, in _create_blocks
blocks.append(sLSTMBlock(config=config))
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\blocks\slstm\block.py", line 33, in init
super().init(
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\blocks\xlstm_block.py", line 63, in init
self.xlstm = sLSTMLayer(config=self.config.slstm)
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\blocks\slstm\layer.py", line 78, in init
self.slstm_cell = sLSTMCell(self.config)
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\blocks\slstm\cell.py", line 780, in new
return sLSTMCell_cuda(config, skip_backend_init=skip_backend_init)
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\blocks\slstm\cell.py", line 690, in init
self.func = sLSTMCellFuncGenerator(self.training, config)
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\blocks\slstm\cell.py", line 536, in sLSTMCellFuncGenerator
slstm_cuda = sLSTMCellCUDA.instance(config=config)
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\blocks\slstm\cell.py", line 515, in instance
cls.mod[repr(config)] = load(
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\blocks\slstm\src\cuda_init.py", line 84, in load
mod = _load(name + suffix, sources, **myargs)
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\torch\utils\cpp_extension.py", line 1309, in load
return _jit_compile(
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\torch\utils\cpp_extension.py", line 1719, in _jit_compile
_write_ninja_file_and_build_library(
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\torch\utils\cpp_extension.py", line 1832, in _write_ninja_file_and_build_library
_run_ninja_build(
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\torch\utils\cpp_extension.py", line 2123, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0'

This problem has been bothering me for three days, who knows how to solve it. I really need it.

WangYLon · 2024-11-14T08:07:40Z

“subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.”
Please check your environment for ninja

IamYipi · 2024-11-18T00:14:40Z

I have same issue.

I installed ninja with:
choco install ninja

and in my terminal I can execute successfully:
ninja --version
1.12.1

when I execute:
ninja -v

ninja: error: loading 'build.ninja': The system cannot find the file specified.

I'm in windows environment but I tested in Debian and same issue too
I'm lost with this error, can you help us please?

WangYLon · 2024-11-18T02:48:35Z

You try to run xlstm's program directly, and'ninja-v 'should be called in the process, because we call the command manually without the specified file, hopefully I can help you

dearsikadeer · 2024-11-18T06:27:08Z

Exactly the same thing happened to me :(
Can anyone help? Thanks in advance! I really need xlstm!

Environment:

CUDA: 11.3
Torch: 2.1.0+cu118
Python: 3.10.15
GPU NVIDIA GeForce RTX 3090
Ubuntu 20.04.5 LTS

CalledProcessError Traceback (most recent call last)
File ~/miniconda3/envs/main/lib/python3.10/site-packages/torch/utils/cpp_extension.py:2100, in _run_ninja_build(build_directory, verbose, error_prefix)
2099 stdout_fileno = 1
-> 2100 subprocess.run(
2101 command,
2102 stdout=stdout_fileno if verbose else subprocess.PIPE,
2103 stderr=subprocess.STDOUT,
2104 cwd=build_directory,
2105 check=True,
2106 env=env)
2107 except subprocess.CalledProcessError as e:
2108 # Python 2 and 3 compatible way of getting the error object.

File ~/miniconda3/envs/main/lib/python3.10/subprocess.py:526, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
525 if check and retcode:
--> 526 raise CalledProcessError(retcode, process.args,
527 output=stdout, stderr=stderr)
528 return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

[1/5] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_forward.cu -o slstm_forward.cuda.o
[31mFAILED: [0mslstm_forward.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_forward.cu -o slstm_forward.cuda.o
/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(29): remark: #pragma message: "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh:29 CUDART_VERSION with FP16: 11030, CUDA_ARCH: 800"

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(70): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(76): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(87): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(76): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(83): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(96): error: identifier "__hmul_rn" is undefined

6 errors detected in the compilation of "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_forward.cu".
[2/5] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_pointwise.cu -o slstm_pointwise.cuda.o
[31mFAILED: [0mslstm_pointwise.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_pointwise.cu -o slstm_pointwise.cuda.o
/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(29): remark: #pragma message: "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh:29 CUDART_VERSION with FP16: 11030, CUDA_ARCH: 800"

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(70): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(76): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(87): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(76): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(83): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(96): error: identifier "__hmul_rn" is undefined

6 errors detected in the compilation of "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_pointwise.cu".
[3/5] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu -o slstm_backward.cuda.o
[31mFAILED: [0mslstm_backward.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu -o slstm_backward.cuda.o
/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(29): remark: #pragma message: "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh:29 CUDART_VERSION with FP16: 11030, CUDA_ARCH: 800"

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(70): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(76): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(87): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(76): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(83): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(96): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu(54): warning: parameter "num_gates_i" was declared but never referenced

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu(54): warning: parameter "num_gates_t" was declared but never referenced

6 errors detected in the compilation of "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu".
[4/5] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu -o slstm_backward_cut.cuda.o
[31mFAILED: [0mslstm_backward_cut.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu -o slstm_backward_cut.cuda.o
/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(29): remark: #pragma message: "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh:29 CUDART_VERSION with FP16: 11030, CUDA_ARCH: 800"

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(70): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(76): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(87): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(76): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(83): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(96): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu(54): warning: parameter "num_gates_i" was declared but never referenced

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu(54): warning: parameter "num_gates_t" was declared but never referenced

6 errors detected in the compilation of "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu".
ninja: build stopped: subcommand failed.

dearsikadeer · 2024-11-18T06:32:13Z

You try to run xlstm's program directly, and'ninja-v 'should be called in the process, because we call the command manually without the specified file, hopefully I can help you

@WangYLon What do you mean by "run xlstm's program directly"? I simply tried to build an instance in this way and the exception follows as mentioned above :(

`from model.ModelxLSTMMixer import xLSTMMixer

import os

os.environ['CUDA_LIB'] = '/usr/local/cuda/lib64'
model = xLSTMMixer(
pred_len=96,
seq_len=20,
enc_in=1,
xlstm_embedding_dim=256
)`

WangYLon · 2024-11-18T07:44:25Z

@dearsikadeer

My answer

You try to run xlstm's program directly, and'ninja-v 'should be called in the process, because we call the command manually without the specified file, hopefully I can help you

is to @ IamYipi，When running the program of sLSTM, Ninja is called automatically, so we don't need to manually execute Ninja-v. Ninja is a build system similar to GNU Make for efficiently compiling and managing dependencies. In the code of sLSTM, we can find that there is a. cu file (CUDA C/C + + program). When sLSTM is included in the model, the code automatically calls Ninja to compile the relevant CUDA program and build the required dependencies.

Judging from your error @ dearsikadeer, your ninja has started compiling and building slstm. It may fail due to other problems. Maybe you can check the version of gcc/g++

zhonglin-cdut · 2024-11-18T08:18:00Z

Thank you for your suggestions. I will try every possible method。

IamYipi · 2024-11-18T11:20:20Z

@dearsikadeer

My answer

You try to run xlstm's program directly, and'ninja-v 'should be called in the process, because we call the command manually without the specified file, hopefully I can help you

is to @ IamYipi，When running the program of sLSTM, Ninja is called automatically, so we don't need to manually execute Ninja-v. Ninja is a build system similar to GNU Make for efficiently compiling and managing dependencies. In the code of sLSTM, we can find that there is a. cu file (CUDA C/C + + program). When sLSTM is included in the model, the code automatically calls Ninja to compile the relevant CUDA program and build the required dependencies.

Judging from your error @ dearsikadeer, your ninja has started compiling and building slstm. It may fail due to other problems. Maybe you can check the version of gcc/g++

Thx for the reply I fixed with the environment variable CUDA_HOME set in correct path.

/usr/local/cuda

dearsikadeer · 2024-11-20T03:34:53Z

@WangYLon gcc Thanks! I still don't know how to fix the problem.

gcc: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

I added this to my .bashrc file but it didn't help.
export CUDA_LIB=/usr/local/cuda/lib64 export CUDA_HOME=/usr/local/cuda export PATH=$CUDA_HOME/bin:$PATH export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

HungNguyen4864 · 2024-11-22T14:06:26Z

You can change the backend of sLSTM to 'vanilla', as the author commented.
Like this

slstm_block=sLSTMBlockConfig(
       slstm=sLSTMLayerConfig(
           backend="vanilla",
           num_heads=4,
           conv1d_kernel_size=4,
           bias_init="powerlaw_blockdependent",
       ),
       feedforward=FeedForwardConfig(proj_factor=1.3, act_fn="gelu"),
   )

it will be work

oloooooo · 2024-11-23T04:17:47Z

@HungNguyen4864 Thank you! It works. But can I put the model on the GPU for training in 'vanilla' mode?

HungNguyen4864 · 2024-11-23T15:47:01Z

@oloooooo

In the sLSTMCell_vanilla class, it uses.reshape()and .permute() operations. These do not inherently require a GPU, indicating that this class is designed to operate without the need for GPU acceleration.

In contrast, the sLSTMCell_cuda class includes an __init__ constructor that has a skip_backend_init parameter. This allows the class to skip initializing components necessary for GPU operation, which is useful for converting models between different hardware configurations. Methods _impl and _impl_step in this class use self.func.apply, a function specifically designed for GPU execution. Input tensors are required to be .contiguous() before being passed to this function, a requirement essential for efficient operations on CUDA.

If there's still a desire to use a GPU, adjustments in the source code would be necessary. I'm not able to provide specifics on how to make these changes but can confirm the information as stated.

You can follow this path to read more : xlstm/blocks/slstm/cell.py

Because i don't know how to fix it so If you can fix it, if it's not too much trouble, could you please provide me with a way to fix it?

2022LJC · 2024-12-27T12:58:10Z

Exactly the same thing happened to me :( Can anyone help? Thanks in advance! I really need xlstm!

Environment:

CUDA: 11.3

Torch: 2.1.0+cu118

Python: 3.10.15

GPU NVIDIA GeForce RTX 3090

Ubuntu 20.04.5 LTS

CalledProcessError Traceback (most recent call last) File ~/miniconda3/envs/main/lib/python3.10/site-packages/torch/utils/cpp_extension.py:2100, in _run_ninja_build(build_directory, verbose, error_prefix) 2099 stdout_fileno = 1 -> 2100 subprocess.run( 2101 command, 2102 stdout=stdout_fileno if verbose else subprocess.PIPE, 2103 stderr=subprocess.STDOUT, 2104 cwd=build_directory, 2105 check=True, 2106 env=env) 2107 except subprocess.CalledProcessError as e: 2108 # Python 2 and 3 compatible way of getting the error object.

File ~/miniconda3/envs/main/lib/python3.10/subprocess.py:526, in run(input, capture_output, timeout, check, *popenargs, **kwargs) 525 if check and retcode: --> 526 raise CalledProcessError(retcode, process.args, 527 output=stdout, stderr=stderr) 528 return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

[1/5] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_forward.cu -o slstm_forward.cuda.o [31mFAILED: [0mslstm_forward.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_forward.cu -o slstm_forward.cuda.o /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(29): remark: #pragma message: "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh:29 CUDART_VERSION with FP16: 11030, CUDA_ARCH: 800"

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(70): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(76): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(87): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(76): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(83): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(96): error: identifier "__hmul_rn" is undefined

6 errors detected in the compilation of "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_forward.cu". [2/5] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_pointwise.cu -o slstm_pointwise.cuda.o [31mFAILED: [0mslstm_pointwise.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_pointwise.cu -o slstm_pointwise.cuda.o /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(29): remark: #pragma message: "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh:29 CUDART_VERSION with FP16: 11030, CUDA_ARCH: 800"

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(70): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(76): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(87): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(76): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(83): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(96): error: identifier "__hmul_rn" is undefined

6 errors detected in the compilation of "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_pointwise.cu". [3/5] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu -o slstm_backward.cuda.o [31mFAILED: [0mslstm_backward.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu -o slstm_backward.cuda.o /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(29): remark: #pragma message: "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh:29 CUDART_VERSION with FP16: 11030, CUDA_ARCH: 800"

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(70): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(76): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(87): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(76): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(83): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(96): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu(54): warning: parameter "num_gates_i" was declared but never referenced

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu(54): warning: parameter "num_gates_t" was declared but never referenced

6 errors detected in the compilation of "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu". [4/5] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu -o slstm_backward_cut.cuda.o [31mFAILED: [0mslstm_backward_cut.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu -o slstm_backward_cut.cuda.o /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(29): remark: #pragma message: "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh:29 CUDART_VERSION with FP16: 11030, CUDA_ARCH: 800"

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(70): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(76): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(87): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(76): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(83): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(96): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu(54): warning: parameter "num_gates_i" was declared but never referenced

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu(54): warning: parameter "num_gates_t" was declared but never referenced

6 errors detected in the compilation of "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu". ninja: build stopped: subcommand failed.

how to fix it?? do you have an answer?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Error building extension 'slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0' #58

RuntimeError: Error building extension 'slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0' #58

zhonglin-cdut commented Nov 13, 2024

WangYLon commented Nov 14, 2024

IamYipi commented Nov 18, 2024 •

edited

Loading

WangYLon commented Nov 18, 2024

dearsikadeer commented Nov 18, 2024

dearsikadeer commented Nov 18, 2024

WangYLon commented Nov 18, 2024

zhonglin-cdut commented Nov 18, 2024

IamYipi commented Nov 18, 2024

dearsikadeer commented Nov 20, 2024

HungNguyen4864 commented Nov 22, 2024 •

edited

Loading

oloooooo commented Nov 23, 2024

HungNguyen4864 commented Nov 23, 2024 •

edited

Loading

2022LJC commented Dec 27, 2024

RuntimeError: Error building extension 'slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0' #58

RuntimeError: Error building extension 'slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0' #58

Comments

zhonglin-cdut commented Nov 13, 2024

WangYLon commented Nov 14, 2024

IamYipi commented Nov 18, 2024 • edited Loading

WangYLon commented Nov 18, 2024

dearsikadeer commented Nov 18, 2024

dearsikadeer commented Nov 18, 2024

WangYLon commented Nov 18, 2024

zhonglin-cdut commented Nov 18, 2024

IamYipi commented Nov 18, 2024

dearsikadeer commented Nov 20, 2024

HungNguyen4864 commented Nov 22, 2024 • edited Loading

oloooooo commented Nov 23, 2024

HungNguyen4864 commented Nov 23, 2024 • edited Loading

2022LJC commented Dec 27, 2024

IamYipi commented Nov 18, 2024 •

edited

Loading

HungNguyen4864 commented Nov 22, 2024 •

edited

Loading

HungNguyen4864 commented Nov 23, 2024 •

edited

Loading