[0.6.1] llama 13b gptq `the value update is not the same shape as the original. updated: (2560, 3840), original (5120, 3840)` #580

Slyne · 2023-12-06T04:32:58Z

Looks like it ignores the mapping.tp_rank.

The text was updated successfully, but these errors were encountered:

juney-nvidia · 2023-12-06T08:00:10Z

@Slyne

Can you share the full command sequences to reproduce the issue?

Thanks
June

Slyne · 2023-12-06T09:28:40Z

@Slyne

Can you share the full command sequences to reproduce the issue?

Thanks June

Step1. Get llama2 13B from meta
Step2. Convert llama2 13B with huggingface script: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py
Step3. Follow GPTQ instructions in llama example under TRTLLM to get llama-13b-4bit-gs128.safetensors
Step4. Run the below command

python build.py --model_dir /llama_hf/13B/ \
                --quant_ckpt_path ./GPTQ-for-LLaMa/llama-13b-4bit-gs128.safetensors \
                --dtype float16 \
                --remove_input_padding \
                --use_gpt_attention_plugin float16 \
                --enable_context_fmha \
                --use_gemm_plugin float16 \
                --use_inflight_batching \
                --paged_kv_cache \
                --use_rmsnorm_plugin \
                --use_weight_only \
                --weight_only_precision int4_gptq \
                --per_group \
                --world_size 2 \
                --tp_size 2 \
                --max-input-len 1900 \
                --max-output-len 64 \
                --output_dir ./tmp/llama/13B/trt_engines/int4_GPTQ/2-gpu/

Barry-Delaney · 2023-12-06T12:35:29Z

@Slyne thanks for the feedback. We have fixed this internally, and will update it in the future main branch.

juney-nvidia · 2023-12-08T10:41:50Z

@Slyne

Close it since it has already been fixed in the main branch. In case there are still things missing, pls open a new issue to track it.

Thanks
June

juney-nvidia self-assigned this Dec 6, 2023

juney-nvidia added triaged Issue has been triaged by maintainers Low Precision Issue about lower bit quantization, including int8, int4, fp8 labels Dec 6, 2023

juney-nvidia assigned Barry-Delaney and unassigned juney-nvidia Dec 6, 2023

kaiyux mentioned this issue Dec 8, 2023

Update TensorRT-LLM #613

Merged

juney-nvidia closed this as completed Dec 8, 2023

kaiyux mentioned this issue Dec 26, 2023

Update TensorRT-LLM Release branch #745

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.6.1] llama 13b gptq `the value update is not the same shape as the original. updated: (2560, 3840), original (5120, 3840)` #580

[0.6.1] llama 13b gptq `the value update is not the same shape as the original. updated: (2560, 3840), original (5120, 3840)` #580

Slyne commented Dec 6, 2023 •

edited

Loading

juney-nvidia commented Dec 6, 2023

Slyne commented Dec 6, 2023 •

edited

Loading

Barry-Delaney commented Dec 6, 2023

juney-nvidia commented Dec 8, 2023

[0.6.1] llama 13b gptq the value update is not the same shape as the original. updated: (2560, 3840), original (5120, 3840) #580

[0.6.1] llama 13b gptq the value update is not the same shape as the original. updated: (2560, 3840), original (5120, 3840) #580

Comments

Slyne commented Dec 6, 2023 • edited Loading

juney-nvidia commented Dec 6, 2023

Slyne commented Dec 6, 2023 • edited Loading

Barry-Delaney commented Dec 6, 2023

juney-nvidia commented Dec 8, 2023

[0.6.1] llama 13b gptq `the value update is not the same shape as the original. updated: (2560, 3840), original (5120, 3840)` #580

[0.6.1] llama 13b gptq `the value update is not the same shape as the original. updated: (2560, 3840), original (5120, 3840)` #580

Slyne commented Dec 6, 2023 •

edited

Loading

Slyne commented Dec 6, 2023 •

edited

Loading