Scale-MultiChallenge
diversePending Human Review
A multi-domain challenge set created by Scale AI to test models across diverse tasks.
Published: 2024
Score Range: 0-100
Top Score: 56.51
Scale-MultiChallenge Leaderboard
| Rank | Model | Provider | Score | Parameters | Released | Type |
|---|---|---|---|---|---|---|
| 1 | o3 | OpenAI | 56.51 | 2025-04-16 | Multimodal | |
| 2 | o4-mini | OpenAI | 42.99 | 2025-04-16 | Multimodal | |
| 3 | Nemotron 3 Nano | NVIDIA | 38.5 | 31.6B (Total), ~3.2B (Active) | 2025-12-15 | Text |
About Scale-MultiChallenge
Methodology
Scale-MultiChallenge evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.
Publication
This benchmark was published in 2024.Technical Paper