MMLU

knowledgePending Verification

Massive Multitask Language Understanding (MMLU) tests knowledge across 57 subjects including mathematics, history, law, and more.

Published: 2020
Score Range: 0-100
Top Score: 92.3

MMLU Leaderboard

RankModelProviderScoreParametersReleasedType
1o1OpenAI
92.3
2024-09-12Multimodal
2DeepSeek-R1DeepSeek
90.8
671B (37B activated)2025-01-20Text
3o1-previewOpenAI
90.8
2024-09-12Text
4GPT-4.1OpenAI
90.2
2025-04-14Multimodal
5GPT-OSS-120BOpenAI
90
117B total (5.1B active per token)2025-08-05Text
6Kimi K2Moonshot AI
89.5
1T total, 32B activated2025-07-11Text
7Claude 3.5 SonnetAnthropic
88.7
2024-06-20Multimodal
8GPT-4oOpenAI
88.7
2024-05-13Multimodal
9DeepSeek-V3DeepSeek
88.5
671B total, 37B activated2024-12-26Text
10Gemini 2.5 FlashGoogle
88.4
2025-05-20Multimodal

About MMLU

Methodology

MMLU evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.

Publication

This benchmark was published in 2020.Technical Paper

Related Benchmarks