Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BFCL] Leaderboard Update, 11/17/2024 #748

Merged
merged 6 commits into from
Nov 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 61 additions & 62 deletions data_live.csv
Original file line number Diff line number Diff line change
@@ -1,72 +1,71 @@
Rank,Model,Live Overall Acc,AST Summary,Python Simple AST,Python Multiple AST,Python Parallel AST,Python Parallel Multiple AST,Irrelevance Detection,Relevance Detection
1,Gemini-1.5-Flash-002 (Prompt),76.28%,78.20%,77.91%,78.30%,93.75%,66.67%,72.91%,85.37%
2,GPT-4-turbo-2024-04-09 (FC),76.23%,77.45%,77.52%,77.63%,81.25%,66.67%,74.51%,73.17%
3,GPT-4o-2024-08-06 (FC),75.43%,74.98%,74.42%,75.12%,81.25%,70.83%,76.69%,63.41%
4,o1-mini-2024-09-12 (Prompt),75.39%,71.39%,73.26%,71.07%,75.00%,62.50%,82.74%,48.78%
1,GPT-4-turbo-2024-04-09 (FC),76.23%,77.45%,77.52%,77.63%,81.25%,66.67%,74.51%,73.17%
2,GPT-4o-2024-08-06 (FC),75.43%,74.98%,74.42%,75.12%,81.25%,70.83%,76.69%,63.41%
3,o1-mini-2024-09-12 (Prompt),75.39%,71.39%,73.26%,71.07%,75.00%,62.50%,82.74%,48.78%
4,Gemini-1.5-Flash-002 (FC),75.12%,71.24%,71.32%,70.97%,81.25%,75.00%,81.71%,60.98%
5,ToolACE-8B (FC),74.99%,73.33%,66.67%,74.93%,81.25%,70.83%,77.26%,80.49%
6,Claude-3.5-Sonnet-20240620 (FC),74.68%,76.85%,80.23%,76.76%,56.25%,58.33%,71.66%,68.29%
7,GPT-4o-mini-2024-07-18 (Prompt),74.63%,75.51%,79.46%,74.35%,93.75%,70.83%,73.26%,75.61%
8,Gemini-1.5-Pro-002 (Prompt),74.41%,77.00%,77.52%,76.76%,87.50%,75.00%,70.86%,65.85%
8,Gemini-1.5-Pro-002 (Prompt),74.28%,78.28%,79.84%,77.72%,87.50%,79.17%,68.11%,75.61%
9,Claude-3-Opus-20240229 (FC tools-2024-04-04),74.10%,74.53%,74.81%,75.60%,50.00%,41.67%,73.94%,63.41%
10,Functionary-Medium-v3.1 (FC),73.48%,81.05%,79.46%,81.87%,68.75%,70.83%,62.06%,70.73%
11,Gemini-1.5-Pro-001 (Prompt),73.12%,69.14%,67.44%,69.24%,93.75%,66.67%,80.00%,56.10%
12,Mistral-Medium-2312 (Prompt),73.10%,71.84%,68.60%,73.00%,81.25%,50.00%,100.00%,60.98%
13,o1-preview-2024-09-12 (Prompt),73.08%,77.53%,80.62%,76.76%,75.00%,79.17%,66.29%,73.17%
14,xLAM-8x22b-r (FC),71.97%,79.40%,78.29%,80.14%,75.00%,62.50%,60.00%,85.37%
15,Functionary-Small-v3.1 (FC),70.41%,75.58%,75.19%,75.89%,81.25%,62.50%,61.83%,85.37%
16,Mistral-small-2402 (FC),70.19%,68.16%,63.57%,71.46%,12.50%,12.50%,72.69%,82.93%
17,GPT-4o-mini-2024-07-18 (FC),70.19%,74.23%,72.87%,74.45%,87.50%,70.83%,63.54%,80.49%
18,Hammer2.0-7b (FC),69.79%,76.63%,74.42%,77.15%,81.25%,75.00%,58.17%,95.12%
19,Command-R-Plus (Prompt) (Original),69.75%,69.59%,66.67%,70.30%,68.75%,70.83%,69.83%,73.17%
20,Gemma-2-27b-it (Prompt),69.48%,77.30%,79.46%,77.24%,68.75%,62.50%,56.69%,87.80%
21,Gemma-2-9b-it (Prompt),69.21%,73.11%,73.64%,73.58%,56.25%,58.33%,62.40%,87.80%
22,Gemini-1.5-Flash-001 (Prompt),69.21%,75.21%,74.42%,75.12%,93.75%,75.00%,59.43%,82.93%
23,xLAM-8x7b-r (FC),69.12%,74.53%,68.22%,76.76%,62.50%,54.17%,60.00%,87.80%
24,GPT-4-turbo-2024-04-09 (Prompt),69.04%,84.64%,85.66%,84.57%,87.50%,75.00%,44.57%,82.93%
25,Open-Mixtral-8x22b (Prompt),68.46%,63.90%,72.87%,61.33%,81.25%,66.67%,75.54%,65.85%
26,mistral-large-2407 (FC),68.37%,79.55%,81.78%,79.27%,68.75%,75.00%,50.97%,75.61%
27,xLAM-7b-r (FC),67.88%,72.28%,71.32%,73.48%,31.25%,58.33%,59.77%,97.56%
28,GPT-3.5-Turbo-0125 (Prompt),67.48%,64.27%,63.57%,64.61%,68.75%,54.17%,71.77%,80.49%
29,Gorilla-OpenFunctions-v2 (FC),67.44%,61.42%,73.64%,58.73%,68.75%,41.67%,76.34%,73.17%
30,Gemini-1.5-Flash-002 (FC),67.35%,57.98%,58.14%,57.96%,68.75%,50.00%,81.94%,60.98%
31,Open-Mixtral-8x22b (FC),66.86%,71.16%,73.26%,72.32%,6.25%,41.67%,59.54%,82.93%
32,Meta-Llama-3-70B-Instruct (Prompt),66.15%,79.10%,78.68%,79.65%,68.75%,66.67%,45.14%,92.68%
33,Qwen2.5-7B-Instruct (Prompt),65.97%,72.13%,72.48%,72.32%,62.50%,66.67%,55.31%,92.68%
34,Gemini-1.5-Pro-001 (FC),65.53%,58.05%,57.75%,58.24%,75.00%,41.67%,77.03%,63.41%
35,Claude-3-Haiku-20240307 (Prompt),65.04%,74.53%,77.13%,74.64%,68.75%,45.83%,49.71%,82.93%
36,Open-Mixtral-8x7b (Prompt),64.95%,63.30%,57.36%,65.00%,68.75%,50.00%,67.31%,68.29%
37,Gemini-1.5-Flash-001 (FC),64.90%,59.48%,58.14%,60.46%,43.75%,41.67%,73.49%,58.54%
38,Gemini-1.5-Pro-002 (FC),64.59%,61.05%,58.91%,61.33%,81.25%,58.33%,69.71%,70.73%
39,Hammer2.0-1.5b (FC),63.22%,68.76%,70.54%,68.56%,56.25%,66.67%,53.37%,92.68%
40,Open-Mistral-Nemo-2407 (FC),62.37%,68.46%,71.71%,67.79%,62.50%,66.67%,53.14%,60.98%
41,DBRX-Instruct (Prompt),62.33%,72.06%,74.81%,71.65%,75.00%,58.33%,46.29%,87.80%
42,GPT-4o-2024-08-06 (Prompt),62.19%,42.55%,42.64%,42.82%,25.00%,41.67%,93.37%,36.59%
43,Hermes-2-Pro-Llama-3-8B (FC),61.79%,64.57%,67.44%,64.42%,56.25%,45.83%,57.83%,56.10%
44,Qwen2.5-1.5B-Instruct (Prompt),61.71%,60.37%,64.73%,59.88%,50.00%,41.67%,63.09%,75.61%
45,GPT-3.5-Turbo-0125 (FC),61.22%,76.25%,74.42%,77.82%,43.75%,50.00%,36.57%,97.56%
46,Llama-3.1-70B-Instruct (Prompt),61.13%,72.58%,77.13%,71.46%,87.50%,62.50%,42.17%,92.68%
47,Hermes-2-Pro-Llama-3-70B (FC),60.51%,55.28%,63.18%,53.04%,56.25%,66.67%,68.46%,60.98%
48,MiniCPM3-4B (FC),59.88%,50.71%,56.98%,49.47%,56.25%,33.33%,73.94%,58.54%
49,Gemini-1.0-Pro-002 (FC),58.91%,55.81%,58.91%,56.12%,37.50%,20.83%,63.20%,68.29%
10,Gemini-1.5-Pro-001 (Prompt),73.83%,72.96%,74.03%,72.32%,93.75%,75.00%,75.66%,63.41%
11,Functionary-Medium-v3.1 (FC),73.48%,81.05%,79.46%,81.87%,68.75%,70.83%,62.06%,70.73%
12,Gemini-1.5-Flash-002 (Prompt),73.21%,75.13%,77.52%,74.73%,87.50%,58.33%,70.06%,78.05%
13,Mistral-Medium-2312 (Prompt),73.10%,71.84%,68.60%,73.00%,81.25%,50.00%,100.00%,60.98%
14,o1-preview-2024-09-12 (Prompt),73.08%,77.53%,80.62%,76.76%,75.00%,79.17%,66.29%,73.17%
15,Gemini-1.5-Flash-001 (FC),72.81%,73.03%,72.48%,73.67%,62.50%,58.33%,72.91%,63.41%
16,Gemini-1.5-Pro-001 (FC),72.81%,71.16%,73.64%,70.59%,81.25%,62.50%,75.77%,63.41%
17,GoGoAgent,72.46%,72.21%,71.32%,72.42%,87.50%,62.50%,72.11%,87.80%
18,Gemini-1.5-Pro-002 (FC),72.41%,74.76%,74.81%,74.64%,87.50%,70.83%,68.80%,73.17%
19,xLAM-8x22b-r (FC),71.97%,79.40%,78.29%,80.14%,75.00%,62.50%,60.00%,85.37%
20,Functionary-Small-v3.1 (FC),70.41%,75.58%,75.19%,75.89%,81.25%,62.50%,61.83%,85.37%
21,Mistral-small-2402 (FC),70.19%,68.16%,63.57%,71.46%,12.50%,12.50%,72.69%,82.93%
22,GPT-4o-mini-2024-07-18 (FC),70.19%,74.23%,72.87%,74.45%,87.50%,70.83%,63.54%,80.49%
23,Hammer2.0-7b (FC),69.79%,76.63%,74.42%,77.15%,81.25%,75.00%,58.17%,95.12%
24,Command-R-Plus (Prompt) (Original),69.75%,69.59%,66.67%,70.30%,68.75%,70.83%,69.83%,73.17%
25,Gemma-2-27b-it (Prompt),69.48%,77.30%,79.46%,77.24%,68.75%,62.50%,56.69%,87.80%
26,Gemma-2-9b-it (Prompt),69.21%,73.11%,73.64%,73.58%,56.25%,58.33%,62.40%,87.80%
27,xLAM-8x7b-r (FC),69.12%,74.53%,68.22%,76.76%,62.50%,54.17%,60.00%,87.80%
28,GPT-4-turbo-2024-04-09 (Prompt),69.04%,84.64%,85.66%,84.57%,87.50%,75.00%,44.57%,82.93%
29,Open-Mixtral-8x22b (Prompt),68.46%,63.90%,72.87%,61.33%,81.25%,66.67%,75.54%,65.85%
30,mistral-large-2407 (FC),68.37%,79.55%,81.78%,79.27%,68.75%,75.00%,50.97%,75.61%
31,Gemini-1.5-Flash-001 (Prompt),68.24%,76.18%,74.81%,76.18%,93.75%,79.17%,55.20%,87.80%
32,xLAM-7b-r (FC),67.88%,72.28%,71.32%,73.48%,31.25%,58.33%,59.77%,97.56%
33,GPT-3.5-Turbo-0125 (Prompt),67.48%,64.27%,63.57%,64.61%,68.75%,54.17%,71.77%,80.49%
34,Gorilla-OpenFunctions-v2 (FC),67.44%,61.42%,73.64%,58.73%,68.75%,41.67%,76.34%,73.17%
35,Open-Mixtral-8x22b (FC),66.86%,71.16%,73.26%,72.32%,6.25%,41.67%,59.54%,82.93%
36,Meta-Llama-3-70B-Instruct (Prompt),66.15%,79.10%,78.68%,79.65%,68.75%,66.67%,45.14%,92.68%
37,Gemini-1.0-Pro-002 (FC),66.10%,67.04%,75.19%,65.96%,50.00%,37.50%,64.57%,68.29%
38,Qwen2.5-7B-Instruct (Prompt),65.97%,72.13%,72.48%,72.32%,62.50%,66.67%,55.31%,92.68%
39,Open-Mixtral-8x7b (Prompt),64.95%,63.30%,57.36%,65.00%,68.75%,50.00%,67.31%,68.29%
40,Hammer2.0-1.5b (FC),63.22%,68.76%,70.54%,68.56%,56.25%,66.67%,53.37%,92.68%
41,Open-Mistral-Nemo-2407 (FC),62.37%,68.46%,71.71%,67.79%,62.50%,66.67%,53.14%,60.98%
42,DBRX-Instruct (Prompt),62.33%,72.06%,74.81%,71.65%,75.00%,58.33%,46.29%,87.80%
43,GPT-4o-2024-08-06 (Prompt),62.19%,42.55%,42.64%,42.82%,25.00%,41.67%,93.37%,36.59%
44,Hermes-2-Pro-Llama-3-8B (FC),61.79%,64.57%,67.44%,64.42%,56.25%,45.83%,57.83%,56.10%
45,Qwen2.5-1.5B-Instruct (Prompt),61.71%,60.37%,64.73%,59.88%,50.00%,41.67%,63.09%,75.61%
46,GPT-3.5-Turbo-0125 (FC),61.22%,76.25%,74.42%,77.82%,43.75%,50.00%,36.57%,97.56%
47,Llama-3.1-70B-Instruct (Prompt),61.13%,72.58%,77.13%,71.46%,87.50%,62.50%,42.17%,92.68%
48,Hermes-2-Pro-Llama-3-70B (FC),60.51%,55.28%,63.18%,53.04%,56.25%,66.67%,68.46%,60.98%
49,MiniCPM3-4B (FC),59.88%,50.71%,56.98%,49.47%,56.25%,33.33%,73.94%,58.54%
50,Llama-3.1-8B-Instruct (Prompt),57.93%,71.31%,71.32%,72.23%,50.00%,45.83%,36.57%,78.05%
51,Claude-3-Haiku-20240307 (FC tools-2024-04-04),57.66%,74.31%,74.03%,77.15%,0.00%,4.17%,30.40%,97.56%
52,Granite-20b-FunctionCalling (FC),57.49%,57.08%,65.12%,55.35%,43.75%,54.17%,56.34%,95.12%
53,Command-R-Plus (FC) (Original),57.26%,61.50%,66.67%,60.56%,56.25%,50.00%,49.14%,92.68%
54,Hermes-2-Pro-Mistral-7B (FC),56.46%,59.85%,64.73%,59.40%,43.75%,37.50%,50.40%,75.61%
55,Claude-3.5-Sonnet-20240620 (Prompt),54.24%,31.24%,65.12%,22.66%,37.50%,33.33%,90.97%,19.51%
56,Qwen2-7B-Instruct (Prompt),54.24%,61.57%,59.30%,62.20%,50.00%,66.67%,41.49%,87.80%
57,Mistral-Small-2402 (Prompt),53.98%,39.48%,18.22%,45.90%,12.50%,8.33%,76.69%,41.46%
58,Nexusflow-Raven-v2 (FC),53.49%,39.03%,39.92%,38.48%,56.25%,41.67%,74.97%,65.85%
59,xLAM-7b-fc-r (FC),53.44%,60.07%,75.58%,57.28%,43.75%,25.00%,42.51%,70.73%
60,mistral-large-2407 (Prompt),53.35%,67.42%,45.74%,73.10%,68.75%,54.17%,30.17%,90.24%
61,Hammer2.0-0.5b (FC),52.42%,45.17%,48.84%,44.07%,62.50%,41.67%,61.94%,85.37%
62,Llama-3.2-3B-Instruct (Prompt),50.91%,44.49%,47.67%,44.74%,0.00%,29.17%,60.11%,63.41%
63,Meta-Llama-3-8B-Instruct (Prompt),50.51%,59.78%,60.85%,60.75%,37.50%,20.83%,35.20%,75.61%
64,Open-Mistral-Nemo-2407 (Prompt),50.33%,75.06%,78.29%,74.54%,75.00%,62.50%,10.74%,90.24%
65,Gemini-1.0-Pro-002 (Prompt),45.67%,38.13%,41.47%,36.93%,68.75%,33.33%,55.54%,80.49%
66,Llama-3.1-70B-Instruct (FC),44.47%,51.01%,48.45%,52.56%,31.25%,25.00%,31.89%,100.00%
67,Gemma-2-2b-it (Prompt),41.63%,11.46%,11.24%,11.96%,0.00%,0.00%,89.03%,12.20%
68,Qwen2-1.5B-Instruct (Prompt),39.00%,41.87%,50.39%,40.50%,25.00%,20.83%,32.91%,75.61%
69,xLAM-1b-fc-r (FC),38.34%,54.31%,63.18%,54.19%,0.00%,0.00%,11.20%,97.56%
70,Llama-3.1-8B-Instruct (FC),33.23%,47.34%,48.06%,47.64%,31.25%,37.50%,8.91%,92.68%
71,Llama-3.2-1B-Instruct (Prompt),29.85%,8.91%,25.97%,4.82%,6.25%,4.17%,60.91%,48.78%
55,Qwen2-7B-Instruct (Prompt),54.24%,61.57%,59.30%,62.20%,50.00%,66.67%,41.49%,87.80%
56,Mistral-Small-2402 (Prompt),53.98%,39.48%,18.22%,45.90%,12.50%,8.33%,76.69%,41.46%
57,Nexusflow-Raven-v2 (FC),53.49%,39.03%,39.92%,38.48%,56.25%,41.67%,74.97%,65.85%
58,xLAM-7b-fc-r (FC),53.44%,60.07%,75.58%,57.28%,43.75%,25.00%,42.51%,70.73%
59,mistral-large-2407 (Prompt),53.35%,67.42%,45.74%,73.10%,68.75%,54.17%,30.17%,90.24%
60,Hammer2.0-0.5b (FC),52.42%,45.17%,48.84%,44.07%,62.50%,41.67%,61.94%,85.37%
61,Llama-3.2-3B-Instruct (Prompt),50.91%,44.49%,47.67%,44.74%,0.00%,29.17%,60.11%,63.41%
62,Meta-Llama-3-8B-Instruct (Prompt),50.51%,59.78%,60.85%,60.75%,37.50%,20.83%,35.20%,75.61%
63,Open-Mistral-Nemo-2407 (Prompt),50.33%,75.06%,78.29%,74.54%,75.00%,62.50%,10.74%,90.24%
64,Gemini-1.0-Pro-002 (Prompt),48.38%,48.61%,50.00%,48.41%,56.25%,37.50%,46.29%,85.37%
65,Llama-3.1-70B-Instruct (FC),44.47%,51.01%,48.45%,52.56%,31.25%,25.00%,31.89%,100.00%
66,Gemma-2-2b-it (Prompt),41.63%,11.46%,11.24%,11.96%,0.00%,0.00%,89.03%,12.20%
67,Qwen2-1.5B-Instruct (Prompt),39.00%,41.87%,50.39%,40.50%,25.00%,20.83%,32.91%,75.61%
68,xLAM-1b-fc-r (FC),38.34%,54.31%,63.18%,54.19%,0.00%,0.00%,11.20%,97.56%
69,Llama-3.1-8B-Instruct (FC),33.23%,47.34%,48.06%,47.64%,31.25%,37.50%,8.91%,92.68%
70,Llama-3.2-1B-Instruct (Prompt),29.85%,8.91%,25.97%,4.82%,6.25%,4.17%,60.91%,48.78%
71 changes: 71 additions & 0 deletions data_multi_turn.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
Rank,Model,Multi Turn Overall Acc,Base,Miss Func,Miss Param,Long Context
1,GPT-4o-2024-08-06 (FC),45.25%,54.50%,44.00%,34.50%,48.00%
2,Claude-3.5-Sonnet-20240620 (FC),40.00%,46.00%,39.00%,35.00%,40.00%
3,GPT-4-turbo-2024-04-09 (FC),39.25%,54.50%,32.50%,29.50%,40.50%
4,o1-preview-2024-09-12 (Prompt),36.62%,43.00%,38.50%,32.50%,32.50%
5,o1-mini-2024-09-12 (Prompt),33.50%,40.50%,32.50%,26.50%,34.50%
6,GPT-4o-mini-2024-07-18 (FC),28.25%,40.50%,15.50%,24.00%,33.00%
7,Claude-3-Opus-20240229 (FC tools-2024-04-04),28.12%,30.00%,29.50%,28.00%,25.00%
8,GPT-4-turbo-2024-04-09 (Prompt),26.75%,36.50%,24.00%,17.00%,29.50%
9,Claude-3-Haiku-20240307 (FC tools-2024-04-04),20.62%,27.50%,15.00%,17.50%,22.50%
10,Gemini-1.5-Pro-002 (FC),19.13%,26.00%,13.50%,19.50%,17.50%
11,Gemini-1.5-Flash-001 (Prompt),17.62%,25.50%,16.00%,12.00%,17.00%
12,GPT-4o-2024-08-06 (Prompt),17.62%,21.50%,14.00%,15.00%,20.00%
13,xLAM-8x22b-r (FC),17.38%,25.50%,20.50%,15.00%,8.50%
14,Functionary-Medium-v3.1 (FC),17.25%,28.50%,12.50%,23.50%,4.50%
15,GPT-3.5-Turbo-0125 (FC),16.88%,28.00%,13.00%,17.00%,9.50%
16,mistral-large-2407 (FC),16.75%,23.00%,12.50%,15.50%,16.00%
17,Gemini-1.5-Pro-002 (Prompt),16.25%,20.00%,15.00%,14.50%,15.50%
18,GPT-4o-mini-2024-07-18 (Prompt),14.50%,20.00%,11.50%,10.00%,16.50%
19,Llama-3.1-70B-Instruct (Prompt),14.25%,18.50%,15.50%,10.00%,13.00%
20,xLAM-8x7b-r (FC),13.88%,18.50%,14.00%,12.50%,10.50%
21,Gemini-1.5-Pro-001 (Prompt),13.12%,14.50%,13.50%,13.50%,11.00%
22,Gemini-1.5-Pro-001 (FC),12.75%,16.00%,11.00%,12.50%,11.50%
23,Gemini-1.5-Flash-002 (Prompt),12.50%,15.00%,14.50%,9.00%,11.50%
24,Gemini-1.5-Flash-001 (FC),10.88%,13.00%,10.00%,13.00%,7.50%
25,Llama-3.1-8B-Instruct (Prompt),10.50%,14.00%,10.50%,8.00%,9.50%
26,Gemini-1.5-Flash-002 (FC),9.75%,15.00%,5.00%,8.00%,11.00%
27,mistral-large-2407 (Prompt),9.62%,14.50%,11.00%,6.00%,7.00%
28,Functionary-Small-v3.1 (FC),8.38%,15.50%,0.50%,12.50%,5.00%
29,Open-Mistral-Nemo-2407 (FC),8.00%,12.00%,5.00%,10.50%,4.50%
30,ToolACE-8B (FC),7.88%,8.50%,10.50%,5.50%,7.00%
31,xLAM-7b-r (FC),6.88%,11.50%,7.00%,6.00%,3.00%
32,Qwen2.5-7B-Instruct (Prompt),6.38%,8.00%,7.50%,6.00%,4.00%
33,GPT-3.5-Turbo-0125 (Prompt),5.75%,7.50%,7.00%,4.00%,4.50%
34,Hammer2.0-7b (FC),5.62%,9.50%,2.00%,7.50%,3.50%
35,Meta-Llama-3-70B-Instruct (Prompt),5.50%,9.50%,4.50%,5.50%,2.50%
36,Llama-3.1-8B-Instruct (FC),4.00%,4.50%,3.50%,5.00%,3.00%
37,Llama-3.1-70B-Instruct (FC),2.75%,4.50%,2.00%,2.00%,2.50%
38,Granite-20b-FunctionCalling (FC),2.75%,5.00%,1.50%,3.00%,1.50%
39,Qwen2-7B-Instruct (Prompt),2.63%,3.50%,3.50%,1.50%,2.00%
40,Gemini-1.0-Pro-002 (FC),2.50%,4.00%,2.50%,2.50%,1.00%
41,Mistral-small-2402 (FC),2.12%,3.50%,0.00%,2.50%,2.50%
42,Gemma-2-27b-it (Prompt),2.12%,3.50%,2.00%,1.50%,1.50%
43,Llama-3.2-3B-Instruct (Prompt),2.12%,1.50%,2.00%,2.00%,3.00%
44,Qwen2.5-1.5B-Instruct (Prompt),1.50%,2.00%,2.00%,1.00%,1.00%
45,Hammer2.0-1.5b (FC),1.38%,2.50%,0.50%,1.00%,1.50%
46,Command-R-Plus (FC) (Original),1.38%,1.50%,0.00%,1.50%,2.50%
47,Gemini-1.0-Pro-002 (Prompt),1.25%,1.00%,3.50%,0.00%,0.50%
48,MiniCPM3-4B (FC),0.88%,1.50%,2.00%,0.00%,0.00%
49,Nexusflow-Raven-v2 (FC),0.88%,1.50%,0.50%,0.50%,1.00%
50,Gemma-2-9b-it (Prompt),0.75%,1.00%,2.00%,0.00%,0.00%
51,Open-Mixtral-8x22b (FC),0.62%,1.00%,0.00%,1.00%,0.50%
52,Open-Mixtral-8x7b (Prompt),0.62%,1.50%,0.00%,0.00%,1.00%
53,Mistral-Medium-2312 (Prompt),0.50%,1.50%,0.00%,0.00%,0.50%
54,Open-Mixtral-8x22b (Prompt),0.50%,0.50%,0.50%,0.00%,1.00%
55,Hermes-2-Pro-Llama-3-8B (FC),0.38%,1.00%,0.00%,0.50%,0.00%
56,Hammer2.0-0.5b (FC),0.38%,0.50%,0.00%,0.50%,0.50%
57,Command-R-Plus (Prompt) (Original),0.38%,1.00%,0.00%,0.00%,0.50%
58,Hermes-2-Pro-Mistral-7B (FC),0.25%,0.50%,0.00%,0.00%,0.50%
59,Mistral-Small-2402 (Prompt),0.25%,0.50%,0.00%,0.00%,0.50%
60,Hermes-2-Pro-Llama-3-70B (FC),0.25%,0.50%,0.00%,0.00%,0.50%
61,GoGoAgent,0.25%,0.50%,0.50%,0.00%,0.00%
62,Open-Mistral-Nemo-2407 (Prompt),0.12%,0.00%,0.50%,0.00%,0.00%
63,xLAM-1b-fc-r (FC),0.12%,0.00%,0.00%,0.00%,0.50%
64,Qwen2-1.5B-Instruct (Prompt),0.12%,0.00%,0.50%,0.00%,0.00%
65,DBRX-Instruct (Prompt),0.00%,0.00%,0.00%,0.00%,0.00%
66,Gemma-2-2b-it (Prompt),0.00%,0.00%,0.00%,0.00%,0.00%
67,Llama-3.2-1B-Instruct (Prompt),0.00%,0.00%,0.00%,0.00%,0.00%
68,xLAM-7b-fc-r (FC),0.00%,0.00%,0.00%,0.00%,0.00%
69,Gorilla-OpenFunctions-v2 (FC),0.00%,0.00%,0.00%,0.00%,0.00%
70,Meta-Llama-3-8B-Instruct (Prompt),0.00%,0.00%,0.00%,0.00%,0.00%
Loading