BIG-bench
diversePending Verification
Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark of 204 diverse tasks.
Published: 2022
Score Range: 0-100
Top Score: 93.1
BIG-bench Leaderboard
Rank | Model | Provider | Score | Parameters | Released | Type |
---|---|---|---|---|---|---|
1 | Claude 3.5 Sonnet | Anthropic | 93.1 | 2024-06-20 | Multimodal | |
2 | Claude 3 Opus | Anthropic | 86.8 | 2024-03-04 | Multimodal | |
3 | Claude 3 Sonnet | Anthropic | 82.9 | 2024-03-04 | Multimodal | |
4 | Claude 3 Haiku | Anthropic | 73.7 | 2024-03-04 | Multimodal | |
5 | Gemini Diffusion | 15 | 2025-05-20 | Text |
About BIG-bench
Methodology
BIG-bench evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.
Publication
This benchmark was published in 2022.Technical Paper