Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在v100上用vllm推理时报错 #3717

Closed
1 task done
Pobby321 opened this issue May 13, 2024 · 12 comments
Closed
1 task done

在v100上用vllm推理时报错 #3717

Pobby321 opened this issue May 13, 2024 · 12 comments
Labels
solved This problem has been already solved

Comments

@Pobby321
Copy link

Pobby321 commented May 13, 2024

Reminder

  • I have read the README and searched the existing issues.

Reproduction

执行
CUDA_VISIBLE_DEVICES=0 llamafactory-cli api LLaMA-Factory/examples/inference/qwen_vllm.yaml
报错
You can use float16 instead by explicitly setting thedtype flag in CLI, for example: --dtype=half
看到有相同错误,但我拉的是最新代码,依旧报错

Expected behavior

No response

System Info

No response

Others

No response

@Pobby321
Copy link
Author

不知道怎么添加dtype参数,汇报如下错误Some keys are not used by the HfArgumentParser: ['dtype']

@hiyouga
Copy link
Owner

hiyouga commented May 13, 2024

报错信息发完整一些

@hiyouga hiyouga added the pending This problem is yet to be addressed label May 13, 2024
@flyrae
Copy link

flyrae commented May 13, 2024

CUDA_VISIBLE_DEVICES=0 DTYPE=half llamafactory-cli api /data/hubo/LLaMA-Factory/examples/inference/qwen_vllm.yaml
可以试试这样,我加了DTYPE=half可以了

@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels May 13, 2024
@Pobby321
Copy link
Author

CUDA_VISIBLE_DEVICES=0 DTYPE=half llamafactory-cli api /data/hubo/LLaMA-Factory/examples/inference/qwen_vllm.yaml 可以试试这样,我加了DTYPE=half可以了

我会显示这个bug
ValueError: Some keys are not used by the HfArgumentParser: ['dtype']

@hiyouga
Copy link
Owner

hiyouga commented May 14, 2024

发一下你完整的报错信息

@Pobby321
Copy link
Author

  • qwen模型的config中"torch_dtype": "bfloat16",,在v100上会导致报错,我手动将此配置改成float16则不会报错,但好像不太合理,担心影响精度

  • 不改参数正产运行vllm推理脚本会报错

    File "/home/root/miniconda3/envs/vllm2/bin/llamafactory-cli", line 8, in
    sys.exit(main())
    File "/data/root/LLaMA-Factory/src/llmtuner/cli.py", line 41, in main
    run_api()
    File "/data/root/LLaMA-Factory/src/llmtuner/api/app.py", line 103, in run_api
    chat_model = ChatModel()
    File "/data/root/LLaMA-Factory/src/llmtuner/chat/chat_model.py", line 28, in init
    self.engine: "BaseEngine" = VllmEngine(model_args, data_args, finetuning_args, generating_args)
    File "/data/root/LLaMA-Factory/src/llmtuner/chat/vllm_engine.py", line 68, in init
    self.model = AsyncLLMEngine.from_engine_args(AsyncEngineArgs(**engine_args))
    File "/home/root/miniconda3/envs/vllm2/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 366, in from_engine_args
    engine = cls(
    File "/home/root/miniconda3/envs/vllm2/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 324, in init
    self.engine = self._init_engine(*args, **kwargs)
    File "/home/root/miniconda3/envs/vllm2/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 442, in _init_engine
    return engine_class(*args, **kwargs)
    File "/home/root/miniconda3/envs/vllm2/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 160, in init
    self.model_executor = executor_class(
    File "/home/root/miniconda3/envs/vllm2/lib/python3.9/site-packages/vllm/executor/executor_base.py", line 41, in init
    self._init_executor()
    File "/home/root/miniconda3/envs/vllm2/lib/python3.9/site-packages/vllm/executor/gpu_executor.py", line 23, in _init_executor
    self._init_non_spec_worker()
    File "/home/root/miniconda3/envs/vllm2/lib/python3.9/site-packages/vllm/executor/gpu_executor.py", line 68, in _init_non_spec_worker
    self.driver_worker.init_device()
    File "/home/root/miniconda3/envs/vllm2/lib/python3.9/site-packages/vllm/worker/worker.py", line 104, in init_device
    _check_if_gpu_supports_dtype(self.model_config.dtype)
    File "/home/root/miniconda3/envs/vllm2/lib/python3.9/site-packages/vllm/worker/worker.py", line 324, in _check_if_gpu_supports_dtype
    raise ValueError(
    ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla V100S-PCIE-32GB GPU has compute capability 7.0. You can use float16 instead by explicitly setting thedtype flag in CLI, for example: --dtype=half.

  • 要是在yaml脚本里添加dtype参数,如下

template: qwen
infer_backend: vllm
vllm_enforce_eager: true
dtype: half

错误如下
File "/home/root/miniconda3/envs/vllm2/bin/llamafactory-cli", line 8, in
sys.exit(main())
File "/data/root/LLaMA-Factory/src/llmtuner/cli.py", line 41, in main
run_api()
File "/data/root/LLaMA-Factory/src/llmtuner/api/app.py", line 103, in run_api
chat_model = ChatModel()
File "/data/root/LLaMA-Factory/src/llmtuner/chat/chat_model.py", line 24, in init
model_args, data_args, finetuning_args, generating_args = get_infer_args(args)
File "/data/root/LLaMA-Factory/src/llmtuner/hparams/parser.py", line 305, in get_infer_args
model_args, data_args, finetuning_args, generating_args = _parse_infer_args(args)
File "/data/root/LLaMA-Factory/src/llmtuner/hparams/parser.py", line 117, in _parse_infer_args
return _parse_args(parser, args)
File "/data/root/LLaMA-Factory/src/llmtuner/hparams/parser.py", line 42, in _parse_args
return parser.parse_yaml_file(os.path.abspath(sys.argv[1]))
File "/home/root/miniconda3/envs/vllm2/lib/python3.9/site-packages/transformers/hf_argparser.py", line 423, in parse_yaml_file
outputs = self.parse_dict(yaml.safe_load(Path(yaml_file).read_text()), allow_extra_keys=allow_extra_keys)
File "/home/root/miniconda3/envs/vllm2/lib/python3.9/site-packages/transformers/hf_argparser.py", line 377, in parse_dict
raise ValueError(f"Some keys are not used by the HfArgumentParser: {sorted(unused_keys)}")
ValueError: Some keys are not used by the HfArgumentParser: ['dtype']

@hiyouga
Copy link
Owner

hiyouga commented May 15, 2024

按道理这里判断了 dtype,没有起作用很奇怪

config = load_config(model_args) # may download model from ms hub
infer_dtype = infer_optim_dtype(model_dtype=getattr(config, "torch_dtype", None))
infer_dtype = str(infer_dtype).split(".")[-1]

@Shame-fight
Copy link

同样的问题,T4显卡,未能解决 @hiyouga

@hiyouga hiyouga added pending This problem is yet to be addressed and removed solved This problem has been already solved labels Jun 4, 2024
@hiyouga hiyouga reopened this Jun 4, 2024
hiyouga added a commit that referenced this issue Jun 5, 2024
@hiyouga
Copy link
Owner

hiyouga commented Jun 5, 2024

vllm_dtype: float16

@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jun 5, 2024
@hiyouga hiyouga closed this as completed Jun 5, 2024
@zydmtaichi
Copy link

vllm_dtype: float16

仍然没有起效,报错ValueError: Some keys are not used by the HfArgumentParser: ['vllm_dtype']

lf版本0.8.3

@zydmtaichi
Copy link

vllm_dtype: float16

这个答案已经过时了,且缺乏相关文档说明,分析相关代码文件可知,对于截至0.8.3的版本来说,适合的yaml设置应该是infer_dtype: float16,同样的情况也适用于issue #3387

@mces89
Copy link

mces89 commented Aug 9, 2024

@zydmtaichi 请问下你的模型本身是bfloat16还是float16的,如果想用infer_dtype:float16的话,是不是需要在vllm load模型之前就要转化成float16, 还是说vllm在加载模型过程中会自动把bf16转化成fp16?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

6 participants