Paper | Model | Usage | Fine-Tuning | Evaluation |
- [12.2024]: We are excited to announce the release of Hammer2.1, our suite of Large Action Models! These models have achieved impressive rankingson the Berkeley Function-Calling Leaderboard.
- [10.2024]: We're excited to release lightweight Hammer 2.0 models (0.5B , 1.5B , 3B , and 7B) with strong function calling capability, which empower developers to build personalized, on-device agentic applications.
- [10.2024]: We have now made our code and accompanying paper for Hammer: Robust Function-Calling for On-Device Language Models via Function Masking publicly available.
- [09.2024]: Hammer model is released! Focusing on on-device applications, we release a number of models from 1.5B, 4B to 7B parameters.
Hammer is a series of lightweight language models with strong function calling capabilities, enabling developers to create personalized, on-device agentic applications. We have released several models based on Function Masking techniques discussed in the paper. These models are available on MadeAgents on Hugging Face.
Hammer models offer flexibility in deployment and usage, fully supporting both vLLM deployment and Hugging Face Transformers tool calling. Below are the specifics on how to make use of these features:
Before using vLLM, first clone the Hammer code repository and change directory to the 'Hammer':
git clone https://github.com/MadeAgents/Hammer.git
cd Hammer
vLLM offers efficient serving with lower latency. To serve the model with vLLM:
vllm serve MadeAgents/Hammer2.1-1.5b --host 0.0.0.0 --port 8000 --tensor-parallel-size 1
Once the model is served, you can use the following Hammer client to interact with it for function calling:
from client import HammerChatCompletion,HammerConfig
config = HammerConfig(base_url="http://localhost:8000/v1/", model="MadeAgents/Hammer2.1-1.5b")
llm = HammerChatCompletion.from_config(config)
# Example conversation
messages = [
{"role": "user", "content": "What's the weather like in New York?"},
{"role": "assistant","content": '','tool_calls': {"name": "get_weather", "arguments": {"location": "New York, NY ", "unit": "celsius"}}},
{"role": "tool", "name": "get_weather", "content": '{"temperature": 35, "description": "Partly cloudy"}'},
{"role": "user", "content": "Now, search for the weather in San Francisco."}
]
# Example function definition (optional)
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
},
"required": ["location"]
}
},
{
"name": "respond",
"description": "When you are ready to respond, use this function. This function allows the assistant to formulate and deliver appropriate replies based on the input message and the context of the conversation. Generate a concise response for simple questions, and a more detailed response for complex questions.",
"parameters": {
"type": "object",
"properties": {
"message": {"type": "string", "description": "The content of the message to respond to."}
},
"required": ["message"]
}
}
]
# tool calls
response = llm.completion(messages, tools=tools)
print(response)
# non tool calls
messages = [
{"role": "user", "content": "What's the weather like in New York?"},
{"role": "assistant","content": '','tool_calls': {"name": "get_weather", "arguments": {"location": "New York, NY ", "unit": "celsius"}}},
{"role": "tool", "name": "get_weather", "content": '{"temperature": 35, "description": "Partly cloudy"}'},
]
response = llm.completion(messages, tools=tools)
print(response)
response = llm.completion(messages)
print(response)
# chat
response = llm.completion([{"role": "user", "content": "What's the weather like in New York?"}])#, tools=tools)
print(response)
Hammer2.1 supports vllm’s built-in tool calling. This functionality requires vllm>=0.6. If you want to enable this functionality, please start vllm’s OpenAI-compatible service with:
vllm serve MadeAgents/Hammer2.1-1.5b --enable-auto-tool-choice --tool-call-parser hermes
And then use it in the same way you use GPT’s tool calling:
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
"default": "celsius"
},
},
"required": ["location","format"],
},
}
},
{
"type": "function",
"function": {
"name": "get_n_day_weather_forecast",
"description": "Get an N-day weather forecast",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
"default": "celsius"
},
"num_days": {
"type": "integer",
"description": "The number of days to forecast",
"default": 1
}
},
"required": ["location", "format", "num_days"]
},
}
},
]
from openai import OpenAI
openai_api_key = "None"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
query = """What's the weather like today in San Francisco"""
chat_response = client.chat.completions.create(
model="MadeAgents/Hammer2.1-1.5b",
messages=[
{"role": "user", "content": query},],
tools = tools,
temperature=0
)
print(chat_response.choices[0].message.content)
Hammer2.1’s chat template also includes a tool calling template, meaning that you can use Hugging Face transformers’ tool calling support. This is a simple example of how to use our model using Transformers.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("MadeAgents/Hammer2.1-1.5b")
model = AutoModelForCausalLM.from_pretrained("MadeAgents/Hammer2.1-1.5b", torch_dtype=torch.bfloat16, device_map="auto")
# Example conversation
messages = [
{"role": "user", "content": "What's the weather like in New York?"},
{"role": "assistant","content": '```\n{"name": "get_weather", "arguments": {"location": "New York, NY ", "unit": "celsius"}\n```'},
{"role": "tool", "name": "get_weather", "content": '{"temperature": 72, "description": "Partly cloudy"}'},
{"role": "user", "content": "Now, search for the weather in San Francisco."}
]
# Example function definition (optional)
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
},
"required": ["location"]
}
},
{
"name": "respond",
"description": "When you are ready to respond, use this function. This function allows the assistant to formulate and deliver appropriate replies based on the input message and the context of the conversation. Generate a concise response for simple questions, and a more detailed response for complex questions.",
"parameters": {
"type": "object",
"properties": {
"message": {"type": "string", "description": "The content of the message to respond to."}
},
"required": ["message"]
}
}
]
inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
out = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):], skip_special_tokens=True))
You should install dependencies using the following command:
pip install -r requirements.txt
Download the datasets Salesforce/xlam-function-calling-60k
and MadeAgents/xlam-irrelevance-7.5k
and place them in the data/train
directory. Simply run the command below to prepare the training data:
python train/data_processing.py
After setting up the training data, you can now train the model using LLaMA-Factory. Replace <MODEL>
with the path or name of the base model you want to use:
bash scripts/train.sh <MODEL>
We conduct a comprehensive evaluation of the performance of the model on tool use leaderboards such as Berkley Function Calling Leaderboard (BFCL), API-Bank, Tool-Alpaca, Nexus Raven and Seal-Tools. For the evaluation code of the BFCL leaderboard, please directly refer to the official documentation. Other evaluation sets present minor issues such as inconsistent formats and errors in labels. We have made appropriate processing, including format conversion and removal of error samples. Specifically:
- apibank_l1 (API-Bank): Only the format of the data has been converted, resulting in 399 samples.
- apibank_l2 (API-Bank): 8 samples for which the ground truth function is not in the candidate function list are filtered out, and the data format is converted, resulting in 127 samples.
- NexusRaven (NexusRaven): Only the format of the data has been converted, resulting in 318 samples.
- sealtool (Seal-Tools): Only single-turn test data is considered, and the data format is converted, resulting in 294 samples.
- toolalpaca (ToolAlpaca): Textual tool definitions were converted to JSON format, and prompt conversion was applied, resulting in 114 samples.
The processed evaluation datasets are placed under the data/train
directory, and are all in the Hammer function calling prompt format (examples available at Hammer dataset example)
Use the following command for LLM inference of the specific dataset with specific models:
bash scripts/eval.sh <MODEL> <DATASET>
For instance, to evaluate the Hammer2.1-7b model on the NexusRaven dataset:
bash scripts/eval.sh /path/to/Hammer2.1-7b NexusRaven
If you want to test the performance of other models, you can obtain the original datasets from the data/test/original
directory. Use the model you wish to test to perform inference, generating a JSONL file that stores the JSON results, which should contain label
and predict
fields. You can refer to the format in data/examples_eval.jsonl
. Finally, run the evaluation script with the following command:
python evaluation/evaluate.py <outputs_dir>
This code is licensed under cc-by-4.0.
If you use Hammer, please cite our paper:
@misc{lin2024hammer,
title={Hammer: Robust Function-Calling for On-Device Language Models via Function Masking},
author={Qiqiang Lin and Muning Wen and Qiuying Peng and Guanyu Nie and Junwei Liao and Jun Wang and Xiaoyun Mo and Jiamu Zhou and Cheng Cheng and Yin Zhao and Jun Wang and Weinan Zhang},
year={2024},
eprint={2410.04587},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2410.04587 },
}