BIG-bench

diversePending Verification

Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark of 204 diverse tasks.

Published: 2022
Score Range: 0-100
Top Score: 93.1

BIG-bench Leaderboard

RankModelProviderScoreParametersReleasedType
1Claude 3.5 SonnetAnthropic
93.1
2024-06-20Multimodal
2Claude 3 OpusAnthropic
86.8
2024-03-04Multimodal
3Claude 3 SonnetAnthropic
82.9
2024-03-04Multimodal
4Claude 3 HaikuAnthropic
73.7
2024-03-04Multimodal
5Gemini DiffusionGoogle
15
2025-05-20Text

About BIG-bench

Methodology

BIG-bench evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.

Publication

This benchmark was published in 2022.Read the full paper

Related Benchmarks