You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
get_tokenizer will automatically read the tokenizer from my model_dir and set the pad_token as well as the eos_token.
actual behavior
But it failed to set the pad_token:
[07/16/2024-13:46:30] [TRT-LLM] [W] Logger level already set from environment. Discard new verbosity: error
[07/16/2024-13:46:30] [TRT-LLM] [I] Starting TensorRT-LLM init.
[TensorRT-LLM][INFO] Set logger level by INFO
[07/16/2024-13:46:30] [TRT-LLM] [I] TensorRT-LLM inited.
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024061100
Initializing model from /mnt/models/CodeQwen1.5-7B-Chat
[07/16/2024-13:47:14] We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:28<00:00, 7.20s/it]
[TensorRT-LLM][WARNING] The manually set model data type is torch.float16, but the data type of the HuggingFace model is torch.bfloat16.
Initializing tokenizer from /mnt/models/CodeQwen1.5-7B-Chat
Traceback (most recent call last):
File "quantization/quantize.py", line 90, in <module>
quantize_and_export(
File "/opt/conda/lib/python3.8/site-packages/tensorrt_llm/quantization/quantize_by_modelopt.py", line 289, in quantize_and_export
tokenizer = get_tokenizer(model_dir,
File "/opt/conda/lib/python3.8/site-packages/tensorrt_llm/quantization/quantize_by_modelopt.py", line 147, in get_tokenizer
assert tokenizer.pad_token is not None, f"Pad token for {model_type} cannot be set!"
AssertionError: Pad token for qwen cannot be set!
additional notes
I commented out some lines except for the AutoTokenizer.from_pretrained() to get this case worked.
defget_tokenizer(ckpt_path, max_seq_length=2048, model_type=None):
print(f"Initializing tokenizer from {ckpt_path}")
tokenizer=AutoTokenizer.from_pretrained(
ckpt_path,
model_max_length=max_seq_length,
padding_side="left",
trust_remote_code=True,
)
# if model_type and model_type == "qwen":# # qwen use token id 151643 as pad and eos tokens# tokenizer.pad_token = tokenizer.convert_ids_to_tokens(151643)# tokenizer.eos_token = tokenizer.convert_ids_to_tokens(151643)# # can't set attribute 'pad_token' for "<unk>"# if tokenizer.pad_token != "<unk>": # nosec B105# tokenizer.pad_token = tokenizer.eos_token# if tokenizer.pad_token is None:# tokenizer.pad_token = tokenizer.eos_token# assert tokenizer.pad_token is not None, f"Pad token for {model_type} cannot be set!"returntokenizer
I know that commenting out these lines will certainly affect other model's conversion. It seems there needs to be a fix on this function to support CodeQwen1.5.
The text was updated successfully, but these errors were encountered:
Yuchen-Cao
changed the title
[Quantization] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat
[Bug] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat
Jul 16, 2024
Yuchen-Cao
changed the title
[Bug] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat
[Feature] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat
Jul 16, 2024
System Info
GPU NVIDIA L20
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I am trying to quantize CodeQwen1.5 7B Chat to FP8 using a modified version of the example quantization script:
Expected behavior
The outside quantize.py will use
quantize_and_export()
to run quantization, and it is defined inside https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/quantization/quantize_by_modelopt.pyget_tokenizer
will automatically read the tokenizer from mymodel_dir
and set thepad_token
as well as theeos_token
.actual behavior
But it failed to set the pad_token:
additional notes
I commented out some lines except for the
AutoTokenizer.from_pretrained()
to get this case worked.I know that commenting out these lines will certainly affect other model's conversion. It seems there needs to be a fix on this function to support CodeQwen1.5.
The text was updated successfully, but these errors were encountered: