[BUG] Tool Calling not working for Llama 3.2 3B #234

raisbecka · 2024-11-11T16:33:12Z

OS

Windows

GPU Library

CUDA 12.x

Python version

3.11

Describe the bug

I am using WSL2 docker with CUDA. Regular text generation runs really well and as expected, but tool calling doesn't work even when the exact name of the tool is in the prompt. Using default prompt template that comes with Llama 3.2 exl2 (https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-exl2).

Here's the thing: Using Ollama on Windows, if I include a tool list with the prompt, it works correctly 95% of the time. For regular chat, I simply omit the tool list when I send the request - otherwise this smaller model tends to hallucinate. But this workflow works very consistently for me. This is with the same default prompt template.

I would prefer to continue using the faster TabbyAPI + ExLlamav2 server, but I need a reliable way to call tools. I know in advance when a prompt should result in at least one tool call, so how can I use this to my advantage (like I do with Ollama)?

Reproduction steps

Use model: https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-exl2/ (the 3_5 branch).
Use default prompt template included with model
Use openai module in Python to connect
Try regular chat messages with no tools parameter supplied. They work.
Try tool prompts with the tools param set to a list of properly defined tools. Doesn't work.

Expected behavior

I should be able to trigger a tool call when I request one and supply a tools param with matching tool names. I should also be able to switch back to normal dialog by supplying an empty tool list.

Logs

Below is a log output of the request being sent by openai module:

2024-11-11 11:21:19,471 - DEBUG - Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'json_data': {'messages': [{'role': 'system', 'content': 'You are a helpful AI assistant. \n\t\t\tWhen the user wants to finish a conversation, \n\t\t\tyou will always respond with "Goodbye!". \n\t\t\tNever use this word unless the user wants to end the conversation. If the user\n\t\t\trequests to end the coversation, you must say it. Use the best available tool to fullfill this request.'}, {'role': 'user', 'content': 'Create a log entry for a Georgina water treatment plant with Stephen Beatty as the OIC for a broken pump on April 1, 2024'}], 'model': 'Llama-3.2-3B-Instruct-exl2', 'temperature': 0, 'tools': [{'type': 'function', 'function': {'name': 'create_log_entry', 'description': 'Create a new log entry', 'parameters': {'type': 'object', 'properties': {'Logbook Title': {'type': 'string', 'description': 'The title of the logbook.'}, 'Event Date': {'type': 'datetime', 'description': 'The date/time the logged event took place.'}, 'OIC First Name': {'type': 'string', 'description': 'The last name of the person assigned as OIC.'}, 'OIC Last Name': {'type': 'string', 'description': 'The last name of the person assigned as OIC.'}, 'Details': {'type': 'string', 'description': 'The details of the log entry.'}}, 'required': ['Logbook Title', 'Event Date', 'Details'], 'additionalProperties': False}}}]}}

2024-11-11 11:21:19,472 - DEBUG - connect_tcp.started host='localhost' port=5000 local_address=None timeout=5.0 socket_options=None

Additional context

Otherwise this is working really well. Love it. Just having the issue with calling tools!

Acknowledgements

I have looked for similar issues before submitting this one.
I have read the disclaimer, and this issue is related to a code bug. If I have a question, I will use the Discord server.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will ask my questions politely.

raisbecka · 2024-11-11T19:03:55Z

To add further material to this:

I get the below response when prompting TabbyAPI with tools and a tool use prompt:

TabbyAPI Response: {
"id": "chatcmpl-31cb9af0f64346faab07106a098a93fb",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"stop_str": "<|tool_start|>",
"message": {
"role": "assistant",
"content": "",
"tool_calls": []
},
"logprobs": null
}
],
"created": 1731350901,
"model": "Llama-3.2-3B-Instruct-exl2",
"object": "chat.completion",
"usage": {
"prompt_tokens": 575,
"completion_tokens": 1,
"total_tokens": 576
}
}

Finish reason and stop str don't necessarily look correct to me - not sure. It LOOKS like it's stopping the reply right as it starts to formulate the tool call? I could be wrong...

raisbecka · 2024-11-11T19:56:36Z

One last thing. Indeed - I just had Cline (Claude Dev) replace my backend with Ollama instead of ExLlamav2 and TabbyAPI - my function calls are working again. Almost 100% accurate, so this is not a config or model issue - it seems to be something else.

Nepherpitou · 2024-11-22T04:25:28Z

Same here with qwen 2.5 coder 32b and qwen 72b

DocShotgun · 2024-11-23T09:04:34Z

I took a quick look at the OAI tool calling docs and the relevant code in tabby, as I'm not familiar with how it's implemented here, findings as below:

Tool calling only works with the custom tool calling template written by the PR author and provided with the repo, and not with official tool calling templates included with models like L3.2 or Qwen2.5. This is because of the template vars used to format the prompt. The template also pulls in the wrong Llama 3 EOS token (<|end_of_text|> instead of <|eot_id|>).
It looks like the underlying logic is that it relies on a specific tool call trigger string to be generated by the model (<|tool_start|> in the example template), before then triggering a re-prompt of the model for another generation using JSON schema and extracting the args from that.
Even with the template, the model sometimes fails to generate the proper string needed to trigger the tool call logic, occasionally producing things like <|tool_start|.
There's a minor bug introduced with the vision PR that causes TypeError (although this issue was created before that was implemented).

tabbyAPI/endpoints/OAI/utils/chat_completion.py

Line 457 in aa4ccd0

pre_tool_prompt = await apply_chat_template(data, gen["text"])

tabbyAPI/endpoints/OAI/utils/chat_completion.py

Line 461 in aa4ccd0

pre_tool_prompt = await apply_chat_template(data, current_generations)

These need to be updated to reflect apply_chat_template now returning a tuple.
There are further errors that follow related to JSON schema generation - narrowed it down to schemas involving an array type, unsure whether tabby/exl2 issue or upstream LMFE issue.

Yeah confirmed tool calling is broken right now, probably needs to be looked at again.

bdashore3 · 2024-11-24T18:27:37Z

Based on what DocShotgun said, tool calling in its current state is broken and those template renders need to be fixed.

However, there's a bigger issue at play here. Tool calling (and vision by proxy) is not standardized.

There's a couple ways to harness tool calling:

Hardcode cases for each model arch
Provide a templating system for users

In my (and ollama's) opinion, 2 is the best way to handle situations like these. It allows users the freedom of making their own templates instead of expecting the devs to support a random new model day 1.

Ollama is not just a model running tool, it's an ecosystem. It provides a centralized repo of models (basically an HF mirror) and provides templates that supports its own templating system. Consequently, TabbyAPI is a decentralized system by design where users can use any model provided it's in the exl2 format (or supported by the exllama runtime). By being decentralized, this reduces the maintainer burden on the devs since TabbyAPI is a hobby project.

Therefore, anyone can create a template which supports tool calling. Take a look at the docs written by @gittb, he did a great job.

If someone does make a tool template for L3.2 based on the official one, feel free to PR it to the templates repository

raisbecka added the bug Something isn't working label Nov 11, 2024

randoentity mentioned this issue Nov 24, 2024

workaround for tool calling #253

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Tool Calling not working for Llama 3.2 3B #234

[BUG] Tool Calling not working for Llama 3.2 3B #234

raisbecka commented Nov 11, 2024

raisbecka commented Nov 11, 2024

raisbecka commented Nov 11, 2024

Nepherpitou commented Nov 22, 2024

DocShotgun commented Nov 23, 2024 •

edited

Loading

bdashore3 commented Nov 24, 2024 •

edited

Loading

[BUG] Tool Calling not working for Llama 3.2 3B #234

[BUG] Tool Calling not working for Llama 3.2 3B #234

Comments

raisbecka commented Nov 11, 2024

OS

GPU Library

Python version

Describe the bug

Reproduction steps

Expected behavior

Logs

Additional context

Acknowledgements

raisbecka commented Nov 11, 2024

raisbecka commented Nov 11, 2024

Nepherpitou commented Nov 22, 2024

DocShotgun commented Nov 23, 2024 • edited Loading

bdashore3 commented Nov 24, 2024 • edited Loading

DocShotgun commented Nov 23, 2024 •

edited

Loading

bdashore3 commented Nov 24, 2024 •

edited

Loading