-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
在v100上用vllm推理时报错 #3717
Comments
不知道怎么添加dtype参数,汇报如下错误Some keys are not used by the HfArgumentParser: ['dtype'] |
报错信息发完整一些 |
CUDA_VISIBLE_DEVICES=0 DTYPE=half llamafactory-cli api /data/hubo/LLaMA-Factory/examples/inference/qwen_vllm.yaml |
我会显示这个bug |
发一下你完整的报错信息 |
template: qwen 错误如下 |
按道理这里判断了 dtype,没有起作用很奇怪 LLaMA-Factory/src/llmtuner/chat/vllm_engine.py Lines 33 to 35 in 3234790
|
同样的问题,T4显卡,未能解决 @hiyouga |
|
仍然没有起效,报错ValueError: Some keys are not used by the HfArgumentParser: ['vllm_dtype'] lf版本0.8.3 |
这个答案已经过时了,且缺乏相关文档说明,分析相关代码文件可知,对于截至0.8.3的版本来说,适合的yaml设置应该是 |
@zydmtaichi 请问下你的模型本身是bfloat16还是float16的,如果想用infer_dtype:float16的话,是不是需要在vllm load模型之前就要转化成float16, 还是说vllm在加载模型过程中会自动把bf16转化成fp16? |
Reminder
Reproduction
执行
CUDA_VISIBLE_DEVICES=0 llamafactory-cli api LLaMA-Factory/examples/inference/qwen_vllm.yaml
报错
You can use float16 instead by explicitly setting the
dtype
flag in CLI, for example: --dtype=half看到有相同错误,但我拉的是最新代码,依旧报错
Expected behavior
No response
System Info
No response
Others
No response
The text was updated successfully, but these errors were encountered: