Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lora_path to chat completion #2438

Merged
merged 9 commits into from
Dec 17, 2024
Merged

Conversation

ccchow
Copy link
Contributor

@ccchow ccchow commented Dec 11, 2024

Motivation

Add lora_path to ChatCompletionRequest for OpenAI chat completion API. It was previously added to OpenAI completion API #2243

Modifications

Added lora_path to ChatCompletionRequest

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@ccchow ccchow requested a review from merrymercy December 13, 2024 00:56
@merrymercy merrymercy merged commit 33c5ff2 into sgl-project:main Dec 17, 2024
1 of 14 checks passed
@merrymercy
Copy link
Contributor

Even if you added this, it seems it is still not used.

@ccchow
Copy link
Contributor Author

ccchow commented Dec 17, 2024

We are working on a project to provide multi-lora serving via OpenAI compatible API, and we have validated this fix by adding lora_path to OpenAI protocol and serving a batch with different lora adapters.

@ccchow
Copy link
Contributor Author

ccchow commented Dec 17, 2024

Thanks for merging this change!

@qingzhong1
Copy link

Hello, how to use url = "http://localhost:8000/v1/chat/completions" to request the configured lora, data = {"model": "Qwen2.5-7B-Instruct","messages": [{ "role": "user", "content": "What is the capital of France?"}]},lora name='aa'

@ccchow
Copy link
Contributor Author

ccchow commented Dec 19, 2024

Hello, how to use url = "http://localhost:8000/v1/chat/completions" to request the configured lora, data = {"model": "Qwen2.5-7B-Instruct","messages": [{ "role": "user", "content": "What is the capital of France?"}]},lora name='aa'

curl -X POST http://127.0.0.1:30000/v1/chat/completions -d '{"model": "meta-llama/Llama-3.2-1B", "messages": [{"role": "system", "content": "You are a happy assistant that puts a positive spin on everything."}, {"role": "user", "content": "I fell off my bike today."}], "lora_path": "lora1", "max_tokens": 64}'

@qingzhong1
Copy link

v1_completions can successfully call lora, but v1/chat/completions cannot call lora, why?Comparing v1_generate_request and v1_chat_generate_request, we found that v1_chat_generate_request does not have the lora_pathes variable

@ccchow
Copy link
Contributor Author

ccchow commented Dec 19, 2024

v1_completions can successfully call lora, but v1/chat/completions cannot call lora, why?Comparing v1_generate_request and v1_chat_generate_request, we found that v1_chat_generate_request does not have the lora_pathes variable

You are right. I missed that when cherry picking changes. Will have another PR.

@ccchow ccchow deleted the chat_lora branch December 19, 2024 17:47
@ccchow
Copy link
Contributor Author

ccchow commented Dec 19, 2024

v1_completions can successfully call lora, but v1/chat/completions cannot call lora, why?Comparing v1_generate_request and v1_chat_generate_request, we found that v1_chat_generate_request does not have the lora_pathes variable

#2529

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants