Skip to content

Hammer: Robust Function-Calling for On-Device Language Models via Function Masking

Notifications You must be signed in to change notification settings

MadeAgents/Hammer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Paper | Model | Usage | Fine-Tuning | Evaluation |


🎉 News


Overview

Hammer is a series of lightweight language models with strong function calling capabilities, enabling developers to create personalized, on-device agentic applications. We have released several models based on Function Masking techniques discussed in the paper. These models are available on MadeAgents on Hugging Face.

Usage

Hammer models offer flexibility in deployment and usage, fully supporting both vLLM deployment and Hugging Face Transformers tool calling. Below are the specifics on how to make use of these features:

Using vLLM

Option 1: Using Hammer client (Recommended)

Before using vLLM, first clone the Hammer code repository and change directory to the 'Hammer':

git clone https://github.com/MadeAgents/Hammer.git
cd Hammer

vLLM offers efficient serving with lower latency. To serve the model with vLLM:

vllm serve MadeAgents/Hammer2.1-1.5b --host 0.0.0.0 --port 8000 --tensor-parallel-size 1

Once the model is served, you can use the following Hammer client to interact with it for function calling:

from client import HammerChatCompletion,HammerConfig
config = HammerConfig(base_url="http://localhost:8000/v1/", model="MadeAgents/Hammer2.1-1.5b")
llm = HammerChatCompletion.from_config(config)

# Example conversation

messages = [
    {"role": "user", "content": "What's the weather like in New York?"},
    {"role": "assistant","content": '','tool_calls': {"name": "get_weather", "arguments": {"location": "New York, NY ", "unit": "celsius"}}},
    {"role": "tool", "name": "get_weather", "content": '{"temperature": 35, "description": "Partly cloudy"}'},
    {"role": "user", "content": "Now, search for the weather in San Francisco."}
]

# Example function definition (optional)
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
            },
            "required": ["location"]
        }
    },
    {
        "name": "respond",
        "description": "When you are ready to respond, use this function. This function allows the assistant to formulate and deliver appropriate replies based on the input message and the context of the conversation. Generate a concise response for simple questions, and a more detailed response for complex questions.",
        "parameters": {
            "type": "object",
            "properties": {
                "message": {"type": "string", "description": "The content of the message to respond to."}
            },
            "required": ["message"]
        }
    }
]

# tool calls
response = llm.completion(messages, tools=tools)
print(response)

# non tool calls
messages = [
    {"role": "user", "content": "What's the weather like in New York?"},
    {"role": "assistant","content": '','tool_calls': {"name": "get_weather", "arguments": {"location": "New York, NY ", "unit": "celsius"}}},
    {"role": "tool", "name": "get_weather", "content": '{"temperature": 35, "description": "Partly cloudy"}'},
]
response = llm.completion(messages, tools=tools)
print(response)

response = llm.completion(messages)
print(response)

# chat
response = llm.completion([{"role": "user", "content": "What's the weather like in New York?"}])#, tools=tools)
print(response)

Option 2: Using vLLM’s built-in tool calling

Hammer2.1 supports vllm’s built-in tool calling. This functionality requires vllm>=0.6. If you want to enable this functionality, please start vllm’s OpenAI-compatible service with:

vllm serve MadeAgents/Hammer2.1-1.5b --enable-auto-tool-choice --tool-call-parser hermes

And then use it in the same way you use GPT’s tool calling:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "format": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit to use. Infer this from the users location.",
                        "default": "celsius"
                    },
                },
                "required": ["location","format"],
            },
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_n_day_weather_forecast",
            "description": "Get an N-day weather forecast",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "format": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit to use. Infer this from the users location.",
                        "default": "celsius"
                    },
                    "num_days": {
                        "type": "integer",
                        "description": "The number of days to forecast",
                        "default": 1
                    }
                },
                "required": ["location", "format", "num_days"]
            },
        }
    },
]


from openai import OpenAI
openai_api_key = "None"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

query = """What's the weather like today in San Francisco"""

chat_response = client.chat.completions.create(
    model="MadeAgents/Hammer2.1-1.5b",
    messages=[
        {"role": "user", "content": query},],
    tools = tools,
    temperature=0
)
print(chat_response.choices[0].message.content)

Using Hugging Face Transformers

Hammer2.1’s chat template also includes a tool calling template, meaning that you can use Hugging Face transformers’ tool calling support. This is a simple example of how to use our model using Transformers.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer


tokenizer = AutoTokenizer.from_pretrained("MadeAgents/Hammer2.1-1.5b")
model = AutoModelForCausalLM.from_pretrained("MadeAgents/Hammer2.1-1.5b", torch_dtype=torch.bfloat16, device_map="auto")

# Example conversation
messages = [
    {"role": "user", "content": "What's the weather like in New York?"},
    {"role": "assistant","content": '```\n{"name": "get_weather", "arguments": {"location": "New York, NY ", "unit": "celsius"}\n```'},
    {"role": "tool", "name": "get_weather", "content": '{"temperature": 72, "description": "Partly cloudy"}'},
    {"role": "user", "content": "Now, search for the weather in San Francisco."}
]

# Example function definition (optional)
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
            },
            "required": ["location"]
        }
    },
    {
        "name": "respond",
        "description": "When you are ready to respond, use this function. This function allows the assistant to formulate and deliver appropriate replies based on the input message and the context of the conversation. Generate a concise response for simple questions, and a more detailed response for complex questions.",
        "parameters": {
            "type": "object",
            "properties": {
                "message": {"type": "string", "description": "The content of the message to respond to."}
            },
            "required": ["message"]
        }
    }
]

inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
out = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):], skip_special_tokens=True))

Fine-Tuning

Install Dependencies

You should install dependencies using the following command:

pip install -r requirements.txt

Data Processing

Download the datasets Salesforce/xlam-function-calling-60k and MadeAgents/xlam-irrelevance-7.5k and place them in the data/train directory. Simply run the command below to prepare the training data:

python train/data_processing.py

Train the Model

After setting up the training data, you can now train the model using LLaMA-Factory. Replace <MODEL> with the path or name of the base model you want to use:

bash scripts/train.sh <MODEL>

Evaluation

We conduct a comprehensive evaluation of the performance of the model on tool use leaderboards such as Berkley Function Calling Leaderboard (BFCL), API-Bank, Tool-Alpaca, Nexus Raven and Seal-Tools. For the evaluation code of the BFCL leaderboard, please directly refer to the official documentation. Other evaluation sets present minor issues such as inconsistent formats and errors in labels. We have made appropriate processing, including format conversion and removal of error samples. Specifically:

  • apibank_l1 (API-Bank): Only the format of the data has been converted, resulting in 399 samples.
  • apibank_l2 (API-Bank): 8 samples for which the ground truth function is not in the candidate function list are filtered out, and the data format is converted, resulting in 127 samples.
  • NexusRaven (NexusRaven): Only the format of the data has been converted, resulting in 318 samples.
  • sealtool (Seal-Tools): Only single-turn test data is considered, and the data format is converted, resulting in 294 samples.
  • toolalpaca (ToolAlpaca): Textual tool definitions were converted to JSON format, and prompt conversion was applied, resulting in 114 samples.

The processed evaluation datasets are placed under the data/train directory, and are all in the Hammer function calling prompt format (examples available at Hammer dataset example)

Evaluate Hammer Model

Use the following command for LLM inference of the specific dataset with specific models:

bash scripts/eval.sh <MODEL> <DATASET>

For instance, to evaluate the Hammer2.1-7b model on the NexusRaven dataset:

bash scripts/eval.sh /path/to/Hammer2.1-7b NexusRaven

Evaluate Other Models

If you want to test the performance of other models, you can obtain the original datasets from the data/test/original directory. Use the model you wish to test to perform inference, generating a JSONL file that stores the JSON results, which should contain label and predict fields. You can refer to the format in data/examples_eval.jsonl. Finally, run the evaluation script with the following command:

python evaluation/evaluate.py <outputs_dir>

Licenses

This code is licensed under cc-by-4.0.

Citation

If you use Hammer, please cite our paper:

@misc{lin2024hammer,
      title={Hammer: Robust Function-Calling for On-Device Language Models via Function Masking}, 
      author={Qiqiang Lin and Muning Wen and Qiuying Peng and Guanyu Nie and Junwei Liao and Jun Wang and Xiaoyun Mo and Jiamu Zhou and Cheng Cheng and Yin Zhao and Jun Wang and Weinan Zhang},
      year={2024},
      eprint={2410.04587},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2410.04587 }, 
}

About

Hammer: Robust Function-Calling for On-Device Language Models via Function Masking

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published