You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running TensorRT-LLM on V100, when I enabled fmha with --enable_context_fmha,
I got this error message: [TensorRT-LLM][ERROR] Assertion failed: Unsupported architecture (/home/build/TensorRT_LLM/TensorRT-LLM-master/cpp/tensorrt_llm/kernels/contextFusedMultiHeadAttention/fmhaRunner.cpp:87)
I checked the code of FusedMHARunnerV2, it seems sm70 and sm75 are not supported.
may I know why V100 is not supported for fmha? or is there any plan on the way?
Thanks!
The text was updated successfully, but these errors were encountered:
Hi @jiangsongHW ! Thanks for the request! Some of the techniques in our fMHA aren't supported <sm80, so we would need a different kernel for V100. There are some complexities in transferring the kernel to V100, particularly with FP32 accumulation to preserve accuracy. We may implement a customer <SM80 fMHA in the future, but unlikely in the near term.
We know HW access is an issue with A100 & H100 - the good news is when you do get access the perf / $ will be better than V100!
I'm running TensorRT-LLM on V100, when I enabled fmha with
--enable_context_fmha
,I got this error message:
[TensorRT-LLM][ERROR] Assertion failed: Unsupported architecture (/home/build/TensorRT_LLM/TensorRT-LLM-master/cpp/tensorrt_llm/kernels/contextFusedMultiHeadAttention/fmhaRunner.cpp:87)
I checked the code of FusedMHARunnerV2, it seems sm70 and sm75 are not supported.
may I know why V100 is not supported for fmha? or is there any plan on the way?
Thanks!
The text was updated successfully, but these errors were encountered: