Global-MMLU

knowledgeVerified

A multilingual evaluation set spanning 42 languages that combines machine translations for MMLU questions along with professional translations and crowd-sourced post-edits. Includes cultural sensitivity annotations classifying questions as Culturally Sensitive (CS) or Culturally Agnostic (CA).

Published: 2025
Scale: 0-100
Top Score: 88.6

Global-MMLU Leaderboard

RankModelProviderScoreParametersReleasedType
1Gemini 2.5 ProGoogle
88.6
2025-05-06Multimodal
2Gemma 3Google
75.4
1B, 4B, 12B, 27B2025-03-12Multimodal

About Global-MMLU

Description

A multilingual evaluation set spanning 42 languages that combines machine translations for MMLU questions along with professional translations and crowd-sourced post-edits. Includes cultural sensitivity annotations classifying questions as Culturally Sensitive (CS) or Culturally Agnostic (CA).

Methodology

Global-MMLU evaluates models on a scale of 0 to 100. Higher scores indicate better performance. For detailed information about the methodology, please refer to the original paper.

Publication

This benchmark was published in 2025.Read the full paper

Related Benchmarks