ARC

reasoningPending Human Review

AI2 Reasoning Challenge (ARC) tests reasoning through grade-school science questions.

Published: 2018

Score Range: 0-100

Top Score: 96.4

ARC Leaderboard

Rank	Model	Provider	Score	Parameters	Released	Type
1	Claude 3 Opus	Anthropic	96.4		2024-03-04	Multimodal
2	Claude 3 Sonnet	Anthropic	93.2		2024-03-04	Multimodal
3	Nemotron 3 Nano	NVIDIA	91.89	31.6B (Total), ~3.2B (Active)	2025-12-15	Text
4	Claude 3 Haiku	Anthropic	89.2		2024-03-04	Multimodal
5	Mixtral 8×22B	Mistral AI	70	141B (39B active)	2024-04-17	Text

About ARC

Methodology

ARC evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.