合并 LoRA 适配器与模型量化,使用 AutoGPTQ 量化模型得到后的模型推理报错如下ValueError: torch.bfloat16 is not supported for quantization method gptq. Supported dtypes: [torch.float16] #4677
Labels
solved
This problem has been already solved
Reminder
System Info
1
Reproduction
1
Expected behavior
合并 LoRA 适配器与模型量化
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
使用 AutoGPTQ 量化模型
llamafactory-cli export examples/merge_lora/llama3_gptq.yaml
通过官方的这两个命令对自己微调的qwen2模型合并适配器与模型量化,量化之后的模型config.json有20多W行,然后使用旧版的api_demo.py 推理报错如下
ValueError: torch.bfloat16 is not supported for quantization method gptq. Supported dtypes: [torch.float16]
Others
帮助解决问题
The text was updated successfully, but these errors were encountered: