Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Tool Calling not working for Llama 3.2 3B #234

Open
4 tasks done
raisbecka opened this issue Nov 11, 2024 · 5 comments
Open
4 tasks done

[BUG] Tool Calling not working for Llama 3.2 3B #234

raisbecka opened this issue Nov 11, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@raisbecka
Copy link

OS

Windows

GPU Library

CUDA 12.x

Python version

3.11

Describe the bug

I am using WSL2 docker with CUDA. Regular text generation runs really well and as expected, but tool calling doesn't work even when the exact name of the tool is in the prompt. Using default prompt template that comes with Llama 3.2 exl2 (https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-exl2).

Here's the thing: Using Ollama on Windows, if I include a tool list with the prompt, it works correctly 95% of the time. For regular chat, I simply omit the tool list when I send the request - otherwise this smaller model tends to hallucinate. But this workflow works very consistently for me. This is with the same default prompt template.

I would prefer to continue using the faster TabbyAPI + ExLlamav2 server, but I need a reliable way to call tools. I know in advance when a prompt should result in at least one tool call, so how can I use this to my advantage (like I do with Ollama)?

Reproduction steps

  • Use model: https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-exl2/ (the 3_5 branch).
  • Use default prompt template included with model
  • Use openai module in Python to connect
  • Try regular chat messages with no tools parameter supplied. They work.
  • Try tool prompts with the tools param set to a list of properly defined tools. Doesn't work.

Expected behavior

I should be able to trigger a tool call when I request one and supply a tools param with matching tool names. I should also be able to switch back to normal dialog by supplying an empty tool list.

Logs

Below is a log output of the request being sent by openai module:

2024-11-11 11:21:19,471 - DEBUG - Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'json_data': {'messages': [{'role': 'system', 'content': 'You are a helpful AI assistant. \n\t\t\tWhen the user wants to finish a conversation, \n\t\t\tyou will always respond with "Goodbye!". \n\t\t\tNever use this word unless the user wants to end the conversation. If the user\n\t\t\trequests to end the coversation, you must say it. Use the best available tool to fullfill this request.'}, {'role': 'user', 'content': 'Create a log entry for a Georgina water treatment plant with Stephen Beatty as the OIC for a broken pump on April 1, 2024'}], 'model': 'Llama-3.2-3B-Instruct-exl2', 'temperature': 0, 'tools': [{'type': 'function', 'function': {'name': 'create_log_entry', 'description': 'Create a new log entry', 'parameters': {'type': 'object', 'properties': {'Logbook Title': {'type': 'string', 'description': 'The title of the logbook.'}, 'Event Date': {'type': 'datetime', 'description': 'The date/time the logged event took place.'}, 'OIC First Name': {'type': 'string', 'description': 'The last name of the person assigned as OIC.'}, 'OIC Last Name': {'type': 'string', 'description': 'The last name of the person assigned as OIC.'}, 'Details': {'type': 'string', 'description': 'The details of the log entry.'}}, 'required': ['Logbook Title', 'Event Date', 'Details'], 'additionalProperties': False}}}]}}

2024-11-11 11:21:19,472 - DEBUG - connect_tcp.started host='localhost' port=5000 local_address=None timeout=5.0 socket_options=None

Additional context

Otherwise this is working really well. Love it. Just having the issue with calling tools!

Acknowledgements

  • I have looked for similar issues before submitting this one.
  • I have read the disclaimer, and this issue is related to a code bug. If I have a question, I will use the Discord server.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will ask my questions politely.
@raisbecka raisbecka added the bug Something isn't working label Nov 11, 2024
@raisbecka
Copy link
Author

To add further material to this:

I get the below response when prompting TabbyAPI with tools and a tool use prompt:

TabbyAPI Response: {
"id": "chatcmpl-31cb9af0f64346faab07106a098a93fb",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"stop_str": "<|tool_start|>",

"message": {
"role": "assistant",
"content": "",
"tool_calls": []
},
"logprobs": null
}
],
"created": 1731350901,
"model": "Llama-3.2-3B-Instruct-exl2",
"object": "chat.completion",
"usage": {
"prompt_tokens": 575,
"completion_tokens": 1,
"total_tokens": 576
}
}

Finish reason and stop str don't necessarily look correct to me - not sure. It LOOKS like it's stopping the reply right as it starts to formulate the tool call? I could be wrong...

@raisbecka
Copy link
Author

One last thing. Indeed - I just had Cline (Claude Dev) replace my backend with Ollama instead of ExLlamav2 and TabbyAPI - my function calls are working again. Almost 100% accurate, so this is not a config or model issue - it seems to be something else.

@Nepherpitou
Copy link

Same here with qwen 2.5 coder 32b and qwen 72b

@DocShotgun
Copy link
Member

DocShotgun commented Nov 23, 2024

I took a quick look at the OAI tool calling docs and the relevant code in tabby, as I'm not familiar with how it's implemented here, findings as below:

  1. Tool calling only works with the custom tool calling template written by the PR author and provided with the repo, and not with official tool calling templates included with models like L3.2 or Qwen2.5. This is because of the template vars used to format the prompt. The template also pulls in the wrong Llama 3 EOS token (<|end_of_text|> instead of <|eot_id|>).
  2. It looks like the underlying logic is that it relies on a specific tool call trigger string to be generated by the model (<|tool_start|> in the example template), before then triggering a re-prompt of the model for another generation using JSON schema and extracting the args from that.
  3. Even with the template, the model sometimes fails to generate the proper string needed to trigger the tool call logic, occasionally producing things like <|tool_start|.
  4. There's a minor bug introduced with the vision PR that causes TypeError (although this issue was created before that was implemented).
    pre_tool_prompt = await apply_chat_template(data, gen["text"])

    pre_tool_prompt = await apply_chat_template(data, current_generations)

    These need to be updated to reflect apply_chat_template now returning a tuple.
  5. There are further errors that follow related to JSON schema generation - narrowed it down to schemas involving an array type, unsure whether tabby/exl2 issue or upstream LMFE issue.

Yeah confirmed tool calling is broken right now, probably needs to be looked at again.

@bdashore3
Copy link
Member

bdashore3 commented Nov 24, 2024

Based on what DocShotgun said, tool calling in its current state is broken and those template renders need to be fixed.

However, there's a bigger issue at play here. Tool calling (and vision by proxy) is not standardized.

There's a couple ways to harness tool calling:

  1. Hardcode cases for each model arch
  2. Provide a templating system for users

In my (and ollama's) opinion, 2 is the best way to handle situations like these. It allows users the freedom of making their own templates instead of expecting the devs to support a random new model day 1.

Ollama is not just a model running tool, it's an ecosystem. It provides a centralized repo of models (basically an HF mirror) and provides templates that supports its own templating system. Consequently, TabbyAPI is a decentralized system by design where users can use any model provided it's in the exl2 format (or supported by the exllama runtime). By being decentralized, this reduces the maintainer burden on the devs since TabbyAPI is a hobby project.

Therefore, anyone can create a template which supports tool calling. Take a look at the docs written by @gittb, he did a great job.

If someone does make a tool template for L3.2 based on the official one, feel free to PR it to the templates repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants