TAU-bench

tool-usePending Verification

Tool Augmented Understanding Benchmark (TAU-bench) evaluates models on their ability to use tools.

Published: 2024
Score Range: 0-100
Top Score: 81.2

TAU-bench Leaderboard

RankModelProviderScoreParametersReleasedType
1Claude 3.7 SonnetAnthropic
81.2
2025-02-24Multimodal
2o3OpenAI
73.9
2025-04-16Multimodal
3o4-miniOpenAI
71.8
2025-04-16Multimodal
4Claude 3.5 HaikuAnthropic
51
2024-10-22Multimodal

About TAU-bench

Methodology

TAU-bench evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.

Publication

This benchmark was published in 2024.Read the full paper