MMLU

knowledge

Massive Multitask Language Understanding (MMLU) tests knowledge across 57 subjects including mathematics, history, law, and more.

Published: 2020
Scale: 0-100
Top Score: 92.3

MMLU Leaderboard

RankModelProviderScoreParametersReleasedType
1o1OpenAI
92.3
2024-09-12Multimodal
2DeepSeek-R1DeepSeek
90.8
671B (37B activated)2025-01-20Text
3o1-previewOpenAI
90.8
2024-09-12Text
4Claude 3.5 SonnetAnthropic
88.7
2024-06-20Multimodal
5GPT-4oOpenAI
88.7
2024-05-13Multimodal
6DeepSeek-V3DeepSeek
88.5
671B total, 37B activated2024-12-26Text
7Gemini 2.5 FlashGoogle
88.4
2025-05-20Multimodal
8Claude Opus 4Anthropic
87.4
2025-05-22Multimodal
9Claude 3 OpusAnthropic
86.8
2024-03-04Multimodal
10Claude 3.7 SonnetAnthropic
86.1
2025-02-24Multimodal

About MMLU

Description

Massive Multitask Language Understanding (MMLU) tests knowledge across 57 subjects including mathematics, history, law, and more.

Methodology

MMLU evaluates models on a scale of 0 to 100. Higher scores indicate better performance. For detailed information about the methodology, please refer to the original paper.

Publication

This benchmark was published in 2020.Read the full paper

Related Benchmarks