-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
啥咱们这里使用vllm输入长度限制在2048token(千问原始支持32k的token),而且显存也没有提供限制的参数 #2782
Comments
使用 --vllm_maxlen 参数修改 |
请问嗯vllm中支持NTK等方式动态扩展吗? |
@hiyouga 那个参数也可以,更改代码也可以。还有显存的限制问题,直接改代码vllm_engine.py 增加了 enforce_eager=True,#add |
现在加了这两个参数 |
请问用vllm如何添加这两个参数? |
Reminder
Reproduction
python src/web_demo.py
--model_name_or_path ../../../workspace/Llama/Qwen-14B-Chat-Int4
--template qwen
--infer_backend vllm --enforce_eager
Expected behavior
System Info
ValueError: Some specified arguments are not used by the HfArgumentParser: ['--enforce_eager']
使用vllm的问题(确实速度提速)但是问题1 显存自适应参数 不能用 问题2 max_seq_len=2048 token参数,也不支持更改
Others
No response
The text was updated successfully, but these errors were encountered: