Global-MMLU
knowledgeVerified
A multilingual evaluation set spanning 42 languages that combines machine translations for MMLU questions along with professional translations and crowd-sourced post-edits. Includes cultural sensitivity annotations classifying questions as Culturally Sensitive (CS) or Culturally Agnostic (CA).
Published: 2025
Score Range: 0-100
Top Score: 88.6
Global-MMLU Leaderboard
Rank | Model | Provider | Score | Parameters | Released | Type |
---|---|---|---|---|---|---|
1 | Gemini 2.5 Pro | 88.6 | 2025-05-06 | Multimodal | ||
2 | Gemma 3 | 75.4 | 1B, 4B, 12B, 27B | 2025-03-12 | Multimodal |
About Global-MMLU
Methodology
Global-MMLU evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.
Publication
This benchmark was published in 2025.Technical Paper