We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA_VISIBLE_DEVICES=0 USE_MODELSCOPE_HUB=1 API_PORT=7860 python src/api_demo.py --model_name_or_path qwen/Qwen-72B-Chat-Int4 --template qwen
openai的chat completion 接口支持 stop指令,可以用来做 early stop。 但是现在的接口好像不支持。希望能支持一下以减少不必要的推理
No response
The text was updated successfully, but these errors were encountered:
"do_sample": false, "temperature": 0.0, "top_p": 0, "n": 1, "max_tokens": 128, "stream": false, "stop": "<|endoftext|>"
我在API 请求中,设置了 stop, 也是没有生效;直到达到了模型生成的最大长度后,才停止生成。
Sorry, something went wrong.
@JieShenAI 还没支持。
Stop parameter is not supported by the huggingface engine yet huggingface engine的推理api什么时候支持stop呢?
Successfully merging a pull request may close this issue.
Reminder
Reproduction
CUDA_VISIBLE_DEVICES=0
USE_MODELSCOPE_HUB=1
API_PORT=7860
python src/api_demo.py
--model_name_or_path qwen/Qwen-72B-Chat-Int4
--template qwen
Expected behavior
openai的chat completion 接口支持 stop指令,可以用来做 early stop。 但是现在的接口好像不支持。希望能支持一下以减少不必要的推理
System Info
No response
Others
No response
The text was updated successfully, but these errors were encountered: