Global-MMLU

knowledgeVerified

A multilingual evaluation set spanning 42 languages that combines machine translations for MMLU questions along with professional translations and crowd-sourced post-edits. Includes cultural sensitivity annotations classifying questions as Culturally Sensitive (CS) or Culturally Agnostic (CA).

Published: 2025
Score Range: 0-100
Top Score: 88.6

Global-MMLU Leaderboard

RankModelProviderScoreParametersReleasedType
1Gemini 2.5 ProGoogle
88.6
2025-05-06Multimodal
2Gemma 3Google
75.4
1B, 4B, 12B, 27B2025-03-12Multimodal

About Global-MMLU

Methodology

Global-MMLU evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.

Publication

This benchmark was published in 2025.Technical Paper

Related Benchmarks