Berkeley Function-Calling Leaderboard
codingVerified
The first comprehensive evaluation of LLMs' function calling capabilities, testing various forms including parallel and multiple function calls across diverse programming languages. Evaluates models on execution accuracy and ability to withhold function selection when appropriate.
Published: 2024
Score Range: 0-100
Top Score: 70.8
Berkeley Function-Calling Leaderboard Leaderboard
Rank | Model | Provider | Score | Parameters | Released | Type |
---|---|---|---|---|---|---|
1 | Qwen-3 | Alibaba | 70.8 | 235B (22B active) | 2025-04-29 | Text |
About Berkeley Function-Calling Leaderboard
Methodology
Berkeley Function-Calling Leaderboard evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.
Publication
This benchmark was published in 2024.Technical Paper