Berkeley Function-Calling Leaderboard
The first comprehensive evaluation of LLMs' function calling capabilities, testing various forms including parallel and multiple function calls across diverse programming languages. Evaluates models on execution accuracy and ability to withhold function selection when appropriate.
Berkeley Function-Calling Leaderboard Leaderboard
Rank | Model | Provider | Score | Parameters | Released | Type |
---|---|---|---|---|---|---|
1 | Qwen-3 | Alibaba | 70.8 | 235B (22B active) | 2025-04-29 | Text |
About Berkeley Function-Calling Leaderboard
Description
The first comprehensive evaluation of LLMs' function calling capabilities, testing various forms including parallel and multiple function calls across diverse programming languages. Evaluates models on execution accuracy and ability to withhold function selection when appropriate.
Methodology
Berkeley Function-Calling Leaderboard evaluates models on a scale of 0 to 100. Higher scores indicate better performance. For detailed information about the methodology, please refer to the original paper.
Publication
This benchmark was published in 2024.Read the full paper