Add lora_paths to v1_chat_generate_request #2529

ccchow · 2024-12-19T18:45:49Z

Motivation

Add missing changes to the v1_chat_generate_request function following PR #2438

Modifications

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

qingzhong1 · 2024-12-20T03:52:08Z

Hello, I have a question. I started the service on a800 according to the following command. Why does the response to request "/v1/chat/completions" take 14 seconds? Is there any way to speed it up?python -m sglang.launch_server --model-path “”
--host 0.0.0.0
--port 8000
--tp-size 1
--mem-fraction-static 0.5
--served-model-name "Qwen2.5-7B-Instruct"
--chunked-prefill-size 64
--disable-cuda-graph
--disable-radix-cache
--lora-paths lora0=“”
--max-loras-per-batch 4

add lora_paths to v1_chat_generate_request

1657f91

ccchow requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners December 19, 2024 18:45

ccchow mentioned this pull request Dec 19, 2024

Add lora_path to chat completion #2438

Merged

3 tasks

merrymercy merged commit 19ba2b0 into sgl-project:main Dec 22, 2024
15 checks passed

chosen-ox pushed a commit to chosen-ox/sglang that referenced this pull request Dec 22, 2024

Add lora_paths to v1_chat_generate_request (sgl-project#2529)

da56141

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lora_paths to v1_chat_generate_request #2529

Add lora_paths to v1_chat_generate_request #2529

ccchow commented Dec 19, 2024

qingzhong1 commented Dec 20, 2024

Add lora_paths to v1_chat_generate_request #2529

Add lora_paths to v1_chat_generate_request #2529

Conversation

ccchow commented Dec 19, 2024

Motivation

Modifications

Checklist

qingzhong1 commented Dec 20, 2024