-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changelog : llama-server
REST API
#9291
Comments
Not a REST API breaking change, but is server-related: some environment variables are changed in #9308 |
After #9398, in the completion response |
Breaking change #9776 : better security control for public deployments
Please note that GET |
Breaking change for
|
Was the |
For security reasons, "/slots" was disabled by default since #9776 , and was mentioned in the breaking changes table. I just forgot to update the docs. |
Not an API change, but maybe good to know that the default web UI for If you want to use the old completion UI, please follow instruction in the PR. |
|
For clarification, we will maintain OAI-compat for all API under
NOTE: OAI support for |
Behavior of |
Added OAI-compat support for If you want to use it with downstream library, be sure to add from openai import OpenAI
client = OpenAI(api_key="dummy", base_url=f"http://localhost:8080/v1")
res = client.completions.create(
model="davinci-002",
prompt="I believe the meaning of life is",
max_tokens=8,
) If you want to use the old non-OAI style, remove the |
Overview
This is a list of changes to the public HTTP interface of the
llama-server
example. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into themaster
branch.If you are building a 3rd party project that relies on
llama-server
, it is recommended to follow this issue and check it carefully before upgrading to new versions.See also:
libllama
APIRecent API changes (most recent at the top)
/v1/completions
is now OAI-compatlogprobs
is now OAI-compat, default to pre-sampling probs/embeddings
supports pooling typenone
"tokens"
output to/completions
endpointpenalize_nl
/slots
and/props
responses/slots
and/props
responses/slots
endpoint: removeslot[i].state
, addslot[i].is_processing
/slots
is now disabled by defaultEndpoints now check for API key if it's set
/rerank
endpoint[DONE]\n\n
in OAI stream response to match specseed_cur
to completion response/health
and/slots
For older changes, use:
Upcoming API changes
The text was updated successfully, but these errors were encountered: