-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenAI Tools / function calling v2 #3237
Conversation
Testing the tools/function calls feature using the example provided (in openai_tools_calls.py) it works fine as is, but when you go further in the chat and ask another question that needs a function calling, it never goes to tools but the assistant would reply 'call_get_current_weather_0 was called with arguments .... ' but not using tool_calls. |
Can we go further on reducing the templating to purely JSON schema? I believe it is possible by framing it as { |
thank you for your work. |
@simon-mo : I so removed everything about the jinja template. @mn9891 : I suspect a bad model for this use case. I started my development with NeuralHermes-7B, but there's a recent model based on Mistral 7B developed by NousResearch that has been trained to call functions which is really great: @Uhao-P : CompletionRequest is not supposed to call functions. |
@FlorianJoncour could you share a sample of a fully formatted prompt containing tools? Say, for a Mistral model? |
Thank you, there's an edge case I hadn't considered. It's possible that the model generates a list of function calls in JSON rather than making the calls one after the other. Now, I was able to make the example script work with Mistral-7B-Instruct-v0.2 despite the fact that it's not trained for this. Edit: I forgot something. The default chat template in the Mistral-7B model enforces an alternating user/assistant text pattern. Since there can be consecutive function calls, the template will raise an error. |
When I use a Python script to make a function call, it succeeds regardless of whether the stream is set to true or false. However, when I integrate with the frontend, and the stream is set to true, the frontend doesn't parse the received results well, and the |
I am eagerly awaiting this too. Is there any area where contributions would be welcomed to help merge this? |
hi, is this PR being worked on? |
hope this feature release as soon as possible. |
Is it possible to share a fully formatted prompt sample? That helps a lot anyone fine tuning models... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FlorianJoncour left some comments, but at a more meta level than any individual line, could you describe how this PR enables responding with multiple function calls (like OpenAI's API)?
OpenAI's API has a weakness where it behaves poorly in streaming mode when it uses the "parallel.multi-tool-use" function as a wrapper, which breaks streaming behavior. I want to make sure as a consumer of the tool API there, that:
- This implementation doesn't duplicate that behavior or breaking streaming.
- I understood what the response chunks look like when the model makes multiple calls - I didn't see that exactly, but I did just skim the PR.
Could you point me to what multi calling looks like?
@@ -125,6 +133,12 @@ def parse_args(): | |||
type=str, | |||
default=None, | |||
help="The file path to the SSL cert file") | |||
parser.add_argument( | |||
"--privileged", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this engine arg is not sufficiently descriptive. To me, "privileged" evokes a notion of requiring root or extra capabilities (a la docker and containers, or user account control in Windows, or admin elevation in macOS.)
This is probably more aptly described as --enable-debug-reload-api
?
@@ -163,6 +206,16 @@ async def health() -> Response: | |||
return Response(status_code=200) | |||
|
|||
|
|||
if "--privileged" in sys.argv: | |||
|
|||
@app.get("/privileged") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here, I don't think /privileged
describes what this request route does.
And should this be a POST, not a GET, as it is an effectful operation?
logger.warning( | ||
"\n" | ||
"##########################################################################\n" | ||
"privileged mode enabled. This should only be used for development purpose.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well - "privileged" just doesn't describe to me what is happening.
is the holdup the naming convention or what am i missing ? |
I think it's more that there are other PRs open on this (e.g. #4656 ) but all of them have shortcomings and are pretty opinionated in one way or another. Tool choice for non-auto tool calling is now supported via guided decoding, but "auto" tool choice is a lot harder to get right because different models use different tool choice prompt templates, use different tokens for indicating tool calls, etc. all of which have to be parsed based on the model-specific format and (ideally) streamed back to the client in an OpenAI API-compatible way, which none of the current PRs fully support |
https://github.com/mistralai/mistral-common/tree/main/src/mistral_common/protocol/instruct should be a decent starting ground / if noone can agree |
I'm trying to work on a PR for an implementation that's less opinionated and would work with Mistral 7B Instruct v0.3, as well as the Hermes 2 Pro models by Nous Research & other tool-calling-capable open models in #5649 |
This PR follows #2488
The implementation has been updated to use the new guided generation.
If during a query, the user sets tool_choice to
auto
, the server will use the template system used in #2488.However, if a specific function is defined, guided generation will be used to only generate the parameters.
Everything is detailed in the
openai_tools_calls.py
example.