啥咱们这里使用vllm输入长度限制在2048token(千问原始支持32k的token)，而且显存也没有提供限制的参数 #2782

sunjunlishi · 2024-03-11T03:36:25Z

Reminder

I have read the README and searched the existing issues.

Reproduction

python src/web_demo.py
--model_name_or_path ../../../workspace/Llama/Qwen-14B-Chat-Int4
--template qwen
--infer_backend vllm --enforce_eager

Expected behavior

System Info

ValueError: Some specified arguments are not used by the HfArgumentParser: ['--enforce_eager']
使用vllm的问题(确实速度提速)但是问题1 显存自适应参数不能用问题2 max_seq_len=2048 token参数，也不支持更改

Others

No response

hiyouga · 2024-03-11T05:41:07Z

使用 --vllm_maxlen 参数修改

KelleyYin · 2024-03-11T06:05:36Z

使用 --vllm_maxlen 参数修改

请问嗯vllm中支持NTK等方式动态扩展吗？

sunjunlishi · 2024-03-11T07:55:28Z

@hiyouga 那个参数也可以，更改代码也可以。还有显存的限制问题，直接改代码vllm_engine.py
require_version("vllm>=0.3.3", "To fix: pip install vllm>=0.3.3")
self.can_generate = finetuning_args.stage == "sft"
engine_args = AsyncEngineArgs(
model=model_args.model_name_or_path,
trust_remote_code=True,
max_model_len=model_args.vllm_maxlen,
tensor_parallel_size=get_device_count(),
disable_log_stats=True,
disable_log_requests=True,
enforce_eager=True,#add
gpu_memory_utilization=0.95#add
)

增加了 enforce_eager=True,#add
gpu_memory_utilization=0.95#add

hiyouga · 2024-03-12T07:53:57Z

现在加了这两个参数

yecphaha · 2024-05-23T10:02:33Z

现在加了这两个参数

请问用vllm如何添加这两个参数？

sunjunlishi closed this as completed Mar 11, 2024

hiyouga added the solved This problem has been already solved label Mar 11, 2024

hiyouga added a commit that referenced this issue Mar 12, 2024

fix #2782 #2798

07f9b75

tybalex pushed a commit to sanjay920/LLaMA-Factory that referenced this issue Mar 15, 2024

fix hiyouga#2782 hiyouga#2798

6b651fc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

啥咱们这里使用vllm输入长度限制在2048token(千问原始支持32k的token)，而且显存也没有提供限制的参数 #2782

啥咱们这里使用vllm输入长度限制在2048token(千问原始支持32k的token)，而且显存也没有提供限制的参数 #2782

sunjunlishi commented Mar 11, 2024

hiyouga commented Mar 11, 2024

KelleyYin commented Mar 11, 2024

sunjunlishi commented Mar 11, 2024 •

edited

Loading

hiyouga commented Mar 12, 2024

yecphaha commented May 23, 2024

啥咱们这里使用vllm输入长度限制在2048token(千问原始支持32k的token)，而且显存也没有提供限制的参数 #2782

啥咱们这里使用vllm输入长度限制在2048token(千问原始支持32k的token)，而且显存也没有提供限制的参数 #2782

Comments

sunjunlishi commented Mar 11, 2024

Reminder

Reproduction

Expected behavior

System Info

Others

hiyouga commented Mar 11, 2024

KelleyYin commented Mar 11, 2024

sunjunlishi commented Mar 11, 2024 • edited Loading

hiyouga commented Mar 12, 2024

yecphaha commented May 23, 2024

sunjunlishi commented Mar 11, 2024 •

edited

Loading