Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Error building extension 'slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0' #58

Open
zhonglin-cdut opened this issue Nov 13, 2024 · 13 comments

Comments

@zhonglin-cdut
Copy link

Traceback (most recent call last):
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\torch\utils\cpp_extension.py", line 2107, in _run_ninja_build
subprocess.run(
File "D:\Anaconda3\envs\DjPytorch\lib\subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\Admin\Desktop\pythonProject(配置GPU环境)\pythonProject1\xlstmtest.py", line 35, in
xlstm_stack = xLSTMBlockStack(cfg)
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\xlstm_block_stack.py", line 84, in init
self.blocks = self._create_blocks(config=config)
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\xlstm_block_stack.py", line 105, in _create_blocks
blocks.append(sLSTMBlock(config=config))
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\blocks\slstm\block.py", line 33, in init
super().init(
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\blocks\xlstm_block.py", line 63, in init
self.xlstm = sLSTMLayer(config=self.config.slstm)
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\blocks\slstm\layer.py", line 78, in init
self.slstm_cell = sLSTMCell(self.config)
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\blocks\slstm\cell.py", line 780, in new
return sLSTMCell_cuda(config, skip_backend_init=skip_backend_init)
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\blocks\slstm\cell.py", line 690, in init
self.func = sLSTMCellFuncGenerator(self.training, config)
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\blocks\slstm\cell.py", line 536, in sLSTMCellFuncGenerator
slstm_cuda = sLSTMCellCUDA.instance(config=config)
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\blocks\slstm\cell.py", line 515, in instance
cls.mod[repr(config)] = load(
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\xlstm\blocks\slstm\src\cuda_init.py", line 84, in load
mod = _load(name + suffix, sources, **myargs)
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\torch\utils\cpp_extension.py", line 1309, in load
return _jit_compile(
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\torch\utils\cpp_extension.py", line 1719, in _jit_compile
_write_ninja_file_and_build_library(
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\torch\utils\cpp_extension.py", line 1832, in _write_ninja_file_and_build_library
_run_ninja_build(
File "D:\Anaconda3\envs\DjPytorch\lib\site-packages\torch\utils\cpp_extension.py", line 2123, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'slstm_HS128BS8NH4NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0'

This problem has been bothering me for three days, who knows how to solve it. I really need it.

@WangYLon
Copy link

“subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.”
Please check your environment for ninja

@IamYipi
Copy link

IamYipi commented Nov 18, 2024

I have same issue.

I installed ninja with:
choco install ninja

and in my terminal I can execute successfully:
ninja --version
1.12.1

when I execute:
ninja -v

ninja: error: loading 'build.ninja': The system cannot find the file specified.

I'm in windows environment but I tested in Debian and same issue too
I'm lost with this error, can you help us please?

@WangYLon
Copy link

You try to run xlstm's program directly, and'ninja-v 'should be called in the process, because we call the command manually without the specified file, hopefully I can help you

@dearsikadeer
Copy link

Exactly the same thing happened to me :(
Can anyone help? Thanks in advance! I really need xlstm!


Environment:

  • CUDA: 11.3
  • Torch: 2.1.0+cu118
  • Python: 3.10.15
  • GPU NVIDIA GeForce RTX 3090
  • Ubuntu 20.04.5 LTS

CalledProcessError Traceback (most recent call last)
File ~/miniconda3/envs/main/lib/python3.10/site-packages/torch/utils/cpp_extension.py:2100, in _run_ninja_build(build_directory, verbose, error_prefix)
2099 stdout_fileno = 1
-> 2100 subprocess.run(
2101 command,
2102 stdout=stdout_fileno if verbose else subprocess.PIPE,
2103 stderr=subprocess.STDOUT,
2104 cwd=build_directory,
2105 check=True,
2106 env=env)
2107 except subprocess.CalledProcessError as e:
2108 # Python 2 and 3 compatible way of getting the error object.

File ~/miniconda3/envs/main/lib/python3.10/subprocess.py:526, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
525 if check and retcode:
--> 526 raise CalledProcessError(retcode, process.args,
527 output=stdout, stderr=stderr)
528 return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.


[1/5] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_forward.cu -o slstm_forward.cuda.o
[31mFAILED: [0mslstm_forward.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_forward.cu -o slstm_forward.cuda.o
/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(29): remark: #pragma message: "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh:29 CUDART_VERSION with FP16: 11030, CUDA_ARCH: 800"

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(70): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(76): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(87): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(76): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(83): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(96): error: identifier "__hmul_rn" is undefined

6 errors detected in the compilation of "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_forward.cu".
[2/5] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_pointwise.cu -o slstm_pointwise.cuda.o
[31mFAILED: [0mslstm_pointwise.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_pointwise.cu -o slstm_pointwise.cuda.o
/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(29): remark: #pragma message: "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh:29 CUDART_VERSION with FP16: 11030, CUDA_ARCH: 800"

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(70): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(76): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(87): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(76): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(83): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(96): error: identifier "__hmul_rn" is undefined

6 errors detected in the compilation of "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_pointwise.cu".
[3/5] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu -o slstm_backward.cuda.o
[31mFAILED: [0mslstm_backward.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu -o slstm_backward.cuda.o
/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(29): remark: #pragma message: "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh:29 CUDART_VERSION with FP16: 11030, CUDA_ARCH: 800"

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(70): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(76): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(87): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(76): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(83): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(96): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu(54): warning: parameter "num_gates_i" was declared but never referenced

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu(54): warning: parameter "num_gates_t" was declared but never referenced

6 errors detected in the compilation of "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu".
[4/5] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu -o slstm_backward_cut.cuda.o
[31mFAILED: [0mslstm_backward_cut.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu -o slstm_backward_cut.cuda.o
/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(29): remark: #pragma message: "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh:29 CUDART_VERSION with FP16: 11030, CUDA_ARCH: 800"

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(70): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(76): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(87): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(76): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(83): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(96): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu(54): warning: parameter "num_gates_i" was declared but never referenced

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu(54): warning: parameter "num_gates_t" was declared but never referenced

6 errors detected in the compilation of "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu".
ninja: build stopped: subcommand failed.

@dearsikadeer
Copy link

You try to run xlstm's program directly, and'ninja-v 'should be called in the process, because we call the command manually without the specified file, hopefully I can help you

@WangYLon What do you mean by "run xlstm's program directly"? I simply tried to build an instance in this way and the exception follows as mentioned above :(

`from model.ModelxLSTMMixer import xLSTMMixer

import os

os.environ['CUDA_LIB'] = '/usr/local/cuda/lib64'
model = xLSTMMixer(
pred_len=96,
seq_len=20,
enc_in=1,
xlstm_embedding_dim=256
)`

@WangYLon
Copy link

@dearsikadeer

My answer

You try to run xlstm's program directly, and'ninja-v 'should be called in the process, because we call the command manually without the specified file, hopefully I can help you

is to @ IamYipi,When running the program of sLSTM, Ninja is called automatically, so we don't need to manually execute Ninja-v. Ninja is a build system similar to GNU Make for efficiently compiling and managing dependencies. In the code of sLSTM, we can find that there is a. cu file (CUDA C/C + + program). When sLSTM is included in the model, the code automatically calls Ninja to compile the relevant CUDA program and build the required dependencies.

Judging from your error @ dearsikadeer, your ninja has started compiling and building slstm. It may fail due to other problems. Maybe you can check the version of gcc/g++

@zhonglin-cdut
Copy link
Author

Thank you for your suggestions. I will try every possible method。

@IamYipi
Copy link

IamYipi commented Nov 18, 2024

@dearsikadeer

My answer

You try to run xlstm's program directly, and'ninja-v 'should be called in the process, because we call the command manually without the specified file, hopefully I can help you

is to @ IamYipi,When running the program of sLSTM, Ninja is called automatically, so we don't need to manually execute Ninja-v. Ninja is a build system similar to GNU Make for efficiently compiling and managing dependencies. In the code of sLSTM, we can find that there is a. cu file (CUDA C/C + + program). When sLSTM is included in the model, the code automatically calls Ninja to compile the relevant CUDA program and build the required dependencies.

Judging from your error @ dearsikadeer, your ninja has started compiling and building slstm. It may fail due to other problems. Maybe you can check the version of gcc/g++

Thx for the reply I fixed with the environment variable CUDA_HOME set in correct path.

/usr/local/cuda

@dearsikadeer
Copy link

@WangYLon gcc Thanks! I still don't know how to fix the problem.

gcc: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

I added this to my .bashrc file but it didn't help.
export CUDA_LIB=/usr/local/cuda/lib64 export CUDA_HOME=/usr/local/cuda export PATH=$CUDA_HOME/bin:$PATH export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

@HungNguyen4864
Copy link

HungNguyen4864 commented Nov 22, 2024

You can change the backend of sLSTM to 'vanilla', as the author commented.
Like this

slstm_block=sLSTMBlockConfig(
       slstm=sLSTMLayerConfig(
           backend="vanilla",
           num_heads=4,
           conv1d_kernel_size=4,
           bias_init="powerlaw_blockdependent",
       ),
       feedforward=FeedForwardConfig(proj_factor=1.3, act_fn="gelu"),
   )

it will be work

@oloooooo
Copy link

@HungNguyen4864 Thank you! It works. But can I put the model on the GPU for training in 'vanilla' mode?

@HungNguyen4864
Copy link

HungNguyen4864 commented Nov 23, 2024

@oloooooo

In the sLSTMCell_vanilla class, it uses.reshape()and .permute() operations. These do not inherently require a GPU, indicating that this class is designed to operate without the need for GPU acceleration.

In contrast, the sLSTMCell_cuda class includes an __init__ constructor that has a skip_backend_init parameter. This allows the class to skip initializing components necessary for GPU operation, which is useful for converting models between different hardware configurations. Methods _impl and _impl_step in this class use self.func.apply, a function specifically designed for GPU execution. Input tensors are required to be .contiguous() before being passed to this function, a requirement essential for efficient operations on CUDA.

If there's still a desire to use a GPU, adjustments in the source code would be necessary. I'm not able to provide specifics on how to make these changes but can confirm the information as stated.

You can follow this path to read more : xlstm/blocks/slstm/cell.py

Because i don't know how to fix it so If you can fix it, if it's not too much trouble, could you please provide me with a way to fix it?

@2022LJC
Copy link

2022LJC commented Dec 27, 2024

Exactly the same thing happened to me :( Can anyone help? Thanks in advance! I really need xlstm!

Environment:

  • CUDA: 11.3
  • Torch: 2.1.0+cu118
  • Python: 3.10.15
  • GPU NVIDIA GeForce RTX 3090
  • Ubuntu 20.04.5 LTS

CalledProcessError Traceback (most recent call last) File ~/miniconda3/envs/main/lib/python3.10/site-packages/torch/utils/cpp_extension.py:2100, in _run_ninja_build(build_directory, verbose, error_prefix) 2099 stdout_fileno = 1 -> 2100 subprocess.run( 2101 command, 2102 stdout=stdout_fileno if verbose else subprocess.PIPE, 2103 stderr=subprocess.STDOUT, 2104 cwd=build_directory, 2105 check=True, 2106 env=env) 2107 except subprocess.CalledProcessError as e: 2108 # Python 2 and 3 compatible way of getting the error object.

File ~/miniconda3/envs/main/lib/python3.10/subprocess.py:526, in run(input, capture_output, timeout, check, *popenargs, **kwargs) 525 if check and retcode: --> 526 raise CalledProcessError(retcode, process.args, 527 output=stdout, stderr=stderr) 528 return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

[1/5] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_forward.cu -o slstm_forward.cuda.o [31mFAILED: [0mslstm_forward.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_forward.cu -o slstm_forward.cuda.o /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(29): remark: #pragma message: "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh:29 CUDART_VERSION with FP16: 11030, CUDA_ARCH: 800"

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(70): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(76): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(87): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(76): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(83): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(96): error: identifier "__hmul_rn" is undefined

6 errors detected in the compilation of "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_forward.cu". [2/5] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_pointwise.cu -o slstm_pointwise.cuda.o [31mFAILED: [0mslstm_pointwise.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_pointwise.cu -o slstm_pointwise.cuda.o /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(29): remark: #pragma message: "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh:29 CUDART_VERSION with FP16: 11030, CUDA_ARCH: 800"

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(70): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(76): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(87): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(76): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(83): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(96): error: identifier "__hmul_rn" is undefined

6 errors detected in the compilation of "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_pointwise.cu". [3/5] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu -o slstm_backward.cuda.o [31mFAILED: [0mslstm_backward.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu -o slstm_backward.cuda.o /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(29): remark: #pragma message: "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh:29 CUDART_VERSION with FP16: 11030, CUDA_ARCH: 800"

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(70): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(76): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(87): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(76): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(83): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(96): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu(54): warning: parameter "num_gates_i" was declared but never referenced

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu(54): warning: parameter "num_gates_t" was declared but never referenced

6 errors detected in the compilation of "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward.cu". [4/5] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu -o slstm_backward_cut.cuda.o [31mFAILED: [0mslstm_backward_cut.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=slstm_HS256BS8NH8NS4DBfDRbDWbDGbDSbDAfNG4SA1GRCV0GRC0d0FCV0FC0d0 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/main/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/user/miniconda3/envs/main/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -Xptxas="-v" -gencode arch=compute_80,code=compute_80 -res-usage --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DSLSTM_HIDDEN_SIZE=256 -DSLSTM_BATCH_SIZE=8 -DSLSTM_NUM_HEADS=8 -DSLSTM_NUM_STATES=4 -DSLSTM_DTYPE_B=float -DSLSTM_DTYPE_R=nv_bfloat16 -DSLSTM_DTYPE_W=nv_bfloat16 -DSLSTM_DTYPE_G=nv_bfloat16 -DSLSTM_DTYPE_S=nv_bfloat16 -DSLSTM_DTYPE_A=float -DSLSTM_NUM_GATES=4 -DSLSTM_SIMPLE_AGG=true -DSLSTM_GRADIENT_RECURRENT_CLIPVAL_VALID=false -DSLSTM_GRADIENT_RECURRENT_CLIPVAL=0.0 -DSLSTM_FORWARD_CLIPVAL_VALID=false -DSLSTM_FORWARD_CLIPVAL=0.0 -U__CUDA_NO_HALF_OPERATORS -U__CUDA_NO_HALF_CONVERSIONS -U__CUDA_NO_BFLOAT16_OPERATORS -U__CUDA_NO_BFLOAT16_CONVERSIONS -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ -std=c++17 -c /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu -o slstm_backward_cut.cuda.o /home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(29): remark: #pragma message: "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh:29 CUDART_VERSION with FP16: 11030, CUDA_ARCH: 800"

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(70): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(76): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_fp16.cuh(87): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(76): error: identifier "__hadd_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(83): error: identifier "__hsub_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/../util/inline_ops_bf16.cuh(96): error: identifier "__hmul_rn" is undefined

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu(54): warning: parameter "num_gates_i" was declared but never referenced

/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu(54): warning: parameter "num_gates_t" was declared but never referenced

6 errors detected in the compilation of "/home/user/miniconda3/envs/main/lib/python3.10/site-packages/xlstm/blocks/slstm/src/cuda/slstm_backward_cut.cu". ninja: build stopped: subcommand failed.

how to fix it?? do you have an answer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants