Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QwenVL build failed. #2483

Open
2 of 4 tasks
Wonder-donbury opened this issue Nov 22, 2024 · 2 comments
Open
2 of 4 tasks

QwenVL build failed. #2483

Wonder-donbury opened this issue Nov 22, 2024 · 2 comments
Assignees
Labels
bug Something isn't working triaged Issue has been triaged by maintainers

Comments

@Wonder-donbury
Copy link

System Info

  • CPU architecure : x86_64
  • GPU properties
    • GPU name : 4x L4 setup
    • GPU memory size : 96GB
  • Libraries
    • TensorRT-LLM branch or tag : main
    • TensorRT version : 0.16.0.dev204111900
    • CUDA version : 12.4.r12.4
  • NVIDIA driver version : 560.35.03
  • OS : Ubuntu 22.04

Who can help?

@kaiyux (I'm referencing you as you were the editor in the related commit)

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Steps to reproduce the behaviour

1 : Followed step 1 to 3-(build TensorRT-LLM engine) with exact same commands.
2 : Running into error message.

Expected behavior

Build gets done succesfully.

actual behavior

./build_qwenVL.sh <- below(exact same code in the examples/qwenvl)

trtllm-build --checkpoint_dir=./tllm_checkpoint_1gpu \
             --gemm_plugin=float16 --gpt_attention_plugin=float16 \
             --max_input_len=2048 --max_seq_len=3072 \
             --max_batch_size=8 --max_prompt_embedding_table_size=2048 \
             --remove_input_padding=enable \
             --output_dir=./trt_engines/Qwen-VL-7B-Chat

(venv) admin@inference_l4x4_admin:~/tensorrt$ ./build_qwenVL.sh --verbose
Authorization required, but no authorization protocol specified
Authorization required, but no authorization protocol specified
Authorization required, but no authorization protocol specified

[TensorRT-LLM] TensorRT-LLM version: 0.16.0.dev2024111900
[11/22/2024-06:24:54] [TRT-LLM] [I] Set bert_attention_plugin to auto.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set gpt_attention_plugin to float16.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set gemm_plugin to float16.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set gemm_swiglu_plugin to None.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set fp8_rowwise_gemm_plugin to None.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set nccl_plugin to auto.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set lora_plugin to None.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set moe_plugin to auto.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set mamba_conv1d_plugin to auto.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set low_latency_gemm_plugin to None.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set low_latency_gemm_swiglu_plugin to None.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set context_fmha to True.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set bert_context_fmha_fp32_acc to False.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set remove_input_padding to True.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set reduce_fusion to False.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set enable_xqa to True.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set tokens_per_block to 64.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set use_fp8_context_fmha to False.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set multiple_profiles to False.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set paged_state to True.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set streamingllm to False.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set use_fused_mlp to True.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set pp_reduce_scatter to False.
[11/22/2024-06:24:54] [TRT-LLM] [W] Implicitly setting QWenConfig.qwen_type = qwen
[11/22/2024-06:24:54] [TRT-LLM] [W] Implicitly setting QWenConfig.moe_intermediate_size = 0
[11/22/2024-06:24:54] [TRT-LLM] [W] Implicitly setting QWenConfig.moe_shared_expert_intermediate_size = 0
[11/22/2024-06:24:54] [TRT-LLM] [W] Implicitly setting QWenConfig.tie_word_embeddings = False
[11/22/2024-06:24:54] [TRT-LLM] [I] Set dtype to bfloat16.
[11/22/2024-06:24:54] [TRT-LLM] [I] Set paged_kv_cache to True.
[11/22/2024-06:24:54] [TRT-LLM] [W] Overriding paged_state to False
[11/22/2024-06:24:54] [TRT-LLM] [I] Set paged_state to False.
[11/22/2024-06:24:54] [TRT-LLM] [W] remove_input_padding is enabled, while opt_num_tokens is not set, setting to max_batch_size*max_beam_width.

[11/22/2024-06:24:54] [TRT-LLM] [W] padding removal and fMHA are both enabled, max_input_len is not required and will be ignored
[11/22/2024-06:24:57] [TRT] [I] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 5665, GPU 322 (MiB)
[11/22/2024-06:25:00] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +2276, GPU +440, now: CPU 8097, GPU 762 (MiB)
[11/22/2024-06:25:00] [TRT-LLM] [I] Set nccl_plugin to None.
[11/22/2024-06:25:00] [TRT] [W] IElementWiseLayer with inputs QWenForCausalLM/transformer/vocab_embedding/where_L2963/SELECT_2_output_0 and QWenForCausalLM/transformer/layers/0/attention/dense/multiply_collect_L272/multiply_and_lora_L238/_gemm_plugin_L129/PLUGIN_V2_Gemm_0_output_0: first input has type BFloat16 but second input has type Half.
[11/22/2024-06:25:01] [TRT] [E] ITensor::getDimensions: Error Code 4: API Usage Error (QWenForCausalLM/transformer/layers/0/__add___L322/elementwise_binary_L2877/ELEMENTWISE_SUM_0: ElementWiseOperation SUM must have same input types. But they are of types BFloat16 and Half.)
Traceback (most recent call last):
File "/home/admin/tensorrt/venv/bin/trtllm-build", line 8, in
sys.exit(main())
File "/home/admin/tensorrt/venv/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 627, in main
parallel_build(model_config, ckpt_dir, build_config, args.output_dir,
File "/home/admin/tensorrt/venv/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 425, in parallel_build
passed = build_and_save(rank, rank % workers, ckpt_dir,
File "/home/admin/tensorrt/venv/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 390, in build_and_save
engine = build_model(build_config,
File "/home/admin/tensorrt/venv/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 383, in build_model
return build(model, build_config)
File "/home/admin/tensorrt/venv/lib/python3.10/site-packages/tensorrt_llm/builder.py", line 1216, in build
model(**inputs)
File "/home/admin/tensorrt/venv/lib/python3.10/site-packages/tensorrt_llm/module.py", line 52, in call
output = self.forward(*args, **kwargs)
File "/home/admin/tensorrt/venv/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 962, in forward
hidden_states = self.transformer.forward(**kwargs)
File "/home/admin/tensorrt/venv/lib/python3.10/site-packages/tensorrt_llm/models/qwen/model.py", line 193, in forward
hidden_states = self.layers.forward(
File "/home/admin/tensorrt/venv/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 550, in forward
hidden_states = layer(
File "/home/admin/tensorrt/venv/lib/python3.10/site-packages/tensorrt_llm/module.py", line 52, in call
output = self.forward(*args, **kwargs)
File "/home/admin/tensorrt/venv/lib/python3.10/site-packages/tensorrt_llm/models/qwen/model.py", line 137, in forward
hidden_states = residual + attention_output
File "/home/admin/tensorrt/venv/lib/python3.10/site-packages/tensorrt_llm/functional.py", line 322, in add
return add(self, b)
File "/home/admin/tensorrt/venv/lib/python3.10/site-packages/tensorrt_llm/functional.py", line 2877, in elementwise_binary
return _create_tensor(layer.get_output(0), layer)
File "/home/admin/tensorrt/venv/lib/python3.10/site-packages/tensorrt_llm/functional.py", line 608, in _create_tensor
assert trt_tensor.shape.len(
AssertionError: tensor QWenForCausalLM/transformer/layers/0/__add___L322/elementwise_binary_L2877/ELEMENTWISE_SUM_0_output_0 has an invalid shape

additional notes

No clue.

@Wonder-donbury Wonder-donbury added the bug Something isn't working label Nov 22, 2024
@Wonder-donbury
Copy link
Author

Setting both --gemm_plugin and --gpt_attention_plugin to =auto resolved the problem, but I still want to know why the datatype wasn't matching. I've also chdecked that the compiled model works fine afterwards in the tensorRT

@sunnyqgg
Copy link
Collaborator

You can avoid this error by specifying float16 when convert_checkpoint.py :

python3 ./examples/qwen/convert_checkpoint.py --model_dir=./Qwen-VL-Chat \
            --output_dir=./tllm_checkpoint_1gpu \
            --dtype float16

@hello-11 hello-11 added the triaged Issue has been triaged by maintainers label Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants