-
-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Tool Calling not working for Llama 3.2 3B #234
Comments
To add further material to this: I get the below response when prompting TabbyAPI with tools and a tool use prompt: TabbyAPI Response: { Finish reason and stop str don't necessarily look correct to me - not sure. It LOOKS like it's stopping the reply right as it starts to formulate the tool call? I could be wrong... |
One last thing. Indeed - I just had Cline (Claude Dev) replace my backend with Ollama instead of ExLlamav2 and TabbyAPI - my function calls are working again. Almost 100% accurate, so this is not a config or model issue - it seems to be something else. |
Same here with qwen 2.5 coder 32b and qwen 72b |
I took a quick look at the OAI tool calling docs and the relevant code in tabby, as I'm not familiar with how it's implemented here, findings as below:
Yeah confirmed tool calling is broken right now, probably needs to be looked at again. |
Based on what DocShotgun said, tool calling in its current state is broken and those template renders need to be fixed. However, there's a bigger issue at play here. Tool calling (and vision by proxy) is not standardized. There's a couple ways to harness tool calling:
In my (and ollama's) opinion, 2 is the best way to handle situations like these. It allows users the freedom of making their own templates instead of expecting the devs to support a random new model day 1. Ollama is not just a model running tool, it's an ecosystem. It provides a centralized repo of models (basically an HF mirror) and provides templates that supports its own templating system. Consequently, TabbyAPI is a decentralized system by design where users can use any model provided it's in the exl2 format (or supported by the exllama runtime). By being decentralized, this reduces the maintainer burden on the devs since TabbyAPI is a hobby project. Therefore, anyone can create a template which supports tool calling. Take a look at the docs written by @gittb, he did a great job. If someone does make a tool template for L3.2 based on the official one, feel free to PR it to the templates repository |
OS
Windows
GPU Library
CUDA 12.x
Python version
3.11
Describe the bug
I am using WSL2 docker with CUDA. Regular text generation runs really well and as expected, but tool calling doesn't work even when the exact name of the tool is in the prompt. Using default prompt template that comes with Llama 3.2 exl2 (https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-exl2).
Here's the thing: Using Ollama on Windows, if I include a tool list with the prompt, it works correctly 95% of the time. For regular chat, I simply omit the tool list when I send the request - otherwise this smaller model tends to hallucinate. But this workflow works very consistently for me. This is with the same default prompt template.
I would prefer to continue using the faster TabbyAPI + ExLlamav2 server, but I need a reliable way to call tools. I know in advance when a prompt should result in at least one tool call, so how can I use this to my advantage (like I do with Ollama)?
Reproduction steps
Expected behavior
I should be able to trigger a tool call when I request one and supply a tools param with matching tool names. I should also be able to switch back to normal dialog by supplying an empty tool list.
Logs
Below is a log output of the request being sent by openai module:
2024-11-11 11:21:19,471 - DEBUG - Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'json_data': {'messages': [{'role': 'system', 'content': 'You are a helpful AI assistant. \n\t\t\tWhen the user wants to finish a conversation, \n\t\t\tyou will always respond with "Goodbye!". \n\t\t\tNever use this word unless the user wants to end the conversation. If the user\n\t\t\trequests to end the coversation, you must say it. Use the best available tool to fullfill this request.'}, {'role': 'user', 'content': 'Create a log entry for a Georgina water treatment plant with Stephen Beatty as the OIC for a broken pump on April 1, 2024'}], 'model': 'Llama-3.2-3B-Instruct-exl2', 'temperature': 0, 'tools': [{'type': 'function', 'function': {'name': 'create_log_entry', 'description': 'Create a new log entry', 'parameters': {'type': 'object', 'properties': {'Logbook Title': {'type': 'string', 'description': 'The title of the logbook.'}, 'Event Date': {'type': 'datetime', 'description': 'The date/time the logged event took place.'}, 'OIC First Name': {'type': 'string', 'description': 'The last name of the person assigned as OIC.'}, 'OIC Last Name': {'type': 'string', 'description': 'The last name of the person assigned as OIC.'}, 'Details': {'type': 'string', 'description': 'The details of the log entry.'}}, 'required': ['Logbook Title', 'Event Date', 'Details'], 'additionalProperties': False}}}]}}
2024-11-11 11:21:19,472 - DEBUG - connect_tcp.started host='localhost' port=5000 local_address=None timeout=5.0 socket_options=None
Additional context
Otherwise this is working really well. Love it. Just having the issue with calling tools!
Acknowledgements
The text was updated successfully, but these errors were encountered: