MMLU-Pro
knowledgeVerified
MMLU-Pro is an enhanced benchmark with over 12,000 challenging questions across 14 domains including Biology, Business, Chemistry, Computer Science, Economics, Engineering, Health, History, Law, Math, Philosophy, Physics, Psychology, and Others. It features 10 answer choices per question (vs. 4 in MMLU) and focuses on complex reasoning tasks.
Published: 2025
Score Range: 0-100
Top Score: 84
MMLU-Pro Leaderboard
Rank | Model | Provider | Score | Parameters | Released | Type |
---|---|---|---|---|---|---|
1 | DeepSeek-R1 | DeepSeek | 84 | 671B (37B activated) | 2025-01-20 | Text |
2 | Kimi K2 | Moonshot AI | 81.1 | 1T total, 32B activated | 2025-07-11 | Text |
3 | Grok 3 | xAI | 79.9 | Unknown (multi-trillion estimated) | 2025-02-19 | Multimodal |
4 | Gemini 2.0 Pro | 79.1 | 2025-02-05 | Multimodal | ||
5 | Grok 3 Mini | xAI | 78.9 | Unknown | 2025-02-19 | Multimodal |
6 | Gemini 2.0 Flash | 77.6 | 2025-02-25 | Multimodal | ||
7 | DeepSeek-V3 | DeepSeek | 75.9 | 671B total, 37B activated | 2024-12-26 | Text |
8 | Grok-2 | xAI | 75.5 | Unknown | 2024-08-13 | Multimodal |
9 | Grok-2 mini | xAI | 72 | Unknown | 2024-08-13 | Multimodal |
10 | Gemini 2.0 Flash-Lite | 71.6 | 2025-02-25 | Multimodal |
About MMLU-Pro
Methodology
MMLU-Pro evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.
Publication
This benchmark was published in 2025.Technical Paper