Scale-MultiChallenge

diversePending Human Review

A multi-domain challenge set created by Scale AI to test models across diverse tasks.

Published: 2024

Score Range: 0-100

Top Score: 56.51

Scale-MultiChallenge Leaderboard

Rank	Model	Provider	Score	Parameters	Released	Type
1	o3	OpenAI	56.51		2025-04-16	Multimodal
2	o4-mini	OpenAI	42.99		2025-04-16	Multimodal
3	Nemotron 3 Nano	NVIDIA	38.5	31.6B (Total), ~3.2B (Active)	2025-12-15	Text

About Scale-MultiChallenge

Methodology

Scale-MultiChallenge evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.

Publication

This benchmark was published in 2024.Technical Paper

Related Benchmarks

BIG-bench

diverse

Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark of 204 diverse tasks.

Published2022

Scale0-100

Technical Paper View Details

Humanitys-Last-Exam

diverse

A challenging benchmark of novel problems designed to test the limits of AI capabilities.

Published2023

Scale0-100

Technical Paper View Details