qserve convert checkpoint raise an error #2507

anaivebird · 2024-11-27T03:19:52Z

System Info

GPU： NVIDIA H100 80G
TensorRT-LLM branch main
TensorRT-LLM commit: 535c9cc

Who can help?

@Tracin

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

huggingface-cli download meta-llama/Llama-2-7b-hf --local-dir ./llama2-7b
git clone https://github.com/mit-han-lab/deepcompressor

cd /root/deepcompressor

conda env create -f environment.yml
poetry install

python -m deepcompressor.app.llm.ptq \
    examples/llm/configs/qoq-g128.yaml \
    --model-name llama-2-7b --model-path /root/llama2-7b \
    --smooth-proj-alpha 0 --smooth-proj-beta 1 \
    --smooth-attn-alpha 0.5 --smooth-attn-beta 0 \
    --save-model /root/quantized-llama2-7b

export TRTLLM_DISABLE_UNIFIED_CONVERTER=1
python convert_checkpoint.py --model_dir /root/llama2-7b \
                             --output_dir /root/trtllm-llama2-7b  \
                             --dtype float16  \
                             --quant_ckpt_path  /root/quantized-llama2-7b \
                             --use_qserve  \
                             --per_group  \
                             --tp_size 1

Expected behavior

no error

actual behavior


user@/app/tensorrt_llm/examples/llama$ export TRTLLM_DISABLE_UNIFIED_CONVERTER=1
python convert_checkpoint.py --model_dir /root/llama2-7b \
                             --output_dir /root/trtllm-llama2-7b  \
                             --dtype float16  \
                             --quant_ckpt_path  /root/quantized-llama2-7b \
                             --use_qserve  \
                             --per_group  \
                             --tp_size 1

[TensorRT-LLM] TensorRT-LLM version: 0.16.0.dev2024111900
0.16.0.dev2024111900
[11/27/2024-11:19:05] [TRT-LLM] [I] Loading weights from lmquant torch checkpoint for QServe W4A8 inference...
[11/27/2024-11:19:12] [TRT-LLM] [I] Processing weights in layer: 0
Traceback (most recent call last):
  File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 555, in <module>
    main()
  File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 547, in main
    convert_and_save_hf(args)
  File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 488, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 495, in execute
    f(args, rank)
  File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 472, in convert_and_save_rank
    llama = LLaMAForCausalLM.from_hugging_face(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 416, in from_hugging_face
    weights = load_weights_from_lmquant(quant_ckpt_path, config)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 2086, in load_weights_from_lmquant
    process_weight_and_params(qkv, f'{tllm_prex}.attention.qkv'))
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 2015, in process_weight_and_params
    qweight = qserve_quantize_weight_per_group(weight, s1_scales,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/quantization/quantize.py", line 328, in qserve_quantize_weight_per_group
    linear_weight.max() <= 15), "Stage 2: Quantized weight out of range"
AssertionError: Stage 2: Quantized weight out of range

additional notes

no

The text was updated successfully, but these errors were encountered:

bobboli · 2024-11-27T09:12:47Z

Hi, this is due to the update from lmquant to deepcompressor. We have updated our conversion scripts accordingly and they will be merged in the next release.

For now if you want to try QServe please use the old-version lmquant https://github.com/mit-han-lab/deepcompressor/blob/lmquant-v0.0.0-deprecated/projects/llm/README.md

anaivebird added the bug Something isn't working label Nov 27, 2024

bobboli mentioned this issue Nov 27, 2024

use qserve with tensorrt-llm raise an error mit-han-lab/deepcompressor#31

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qserve convert checkpoint raise an error #2507

qserve convert checkpoint raise an error #2507

anaivebird commented Nov 27, 2024 •

edited

Loading

bobboli commented Nov 27, 2024 •

edited

Loading

qserve convert checkpoint raise an error #2507

qserve convert checkpoint raise an error #2507

Comments

anaivebird commented Nov 27, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

bobboli commented Nov 27, 2024 • edited Loading

anaivebird commented Nov 27, 2024 •

edited

Loading

bobboli commented Nov 27, 2024 •

edited

Loading