MMLU-Pro
MMLU-Pro is an enhanced benchmark with over 12,000 challenging questions across 14 domains including Biology, Business, Chemistry, Computer Science, Economics, Engineering, Health, History, Law, Math, Philosophy, Physics, Psychology, and Others. It features 10 answer choices per question (vs. 4 in MMLU) and focuses on complex reasoning tasks.
MMLU-Pro Leaderboard
Rank | Model | Provider | Score | Parameters | Released | Type |
---|---|---|---|---|---|---|
1 | DeepSeek-R1 | DeepSeek | 84 | 671B (37B activated) | 2025-01-20 | Text |
2 | Gemini 2.0 Pro | 79.1 | 2025-02-05 | Multimodal | ||
3 | Gemini 2.0 Flash | 77.6 | 2025-02-25 | Multimodal | ||
4 | DeepSeek-V3 | DeepSeek | 75.9 | 671B total, 37B activated | 2024-12-26 | Text |
5 | Gemini 2.0 Flash-Lite | 71.6 | 2025-02-25 | Multimodal | ||
6 | Qwen-2 | Alibaba | 55.6 | 72B | 2024-06-11 | Text |
About MMLU-Pro
Description
MMLU-Pro is an enhanced benchmark with over 12,000 challenging questions across 14 domains including Biology, Business, Chemistry, Computer Science, Economics, Engineering, Health, History, Law, Math, Philosophy, Physics, Psychology, and Others. It features 10 answer choices per question (vs. 4 in MMLU) and focuses on complex reasoning tasks.
Methodology
MMLU-Pro evaluates models on a scale of 0 to 100. Higher scores indicate better performance. For detailed information about the methodology, please refer to the original paper.
Publication
This benchmark was published in 2025.Read the full paper