Global-MMLU

knowledgeVerified

A multilingual evaluation set spanning 42 languages that combines machine translations for MMLU questions along with professional translations and crowd-sourced post-edits. Includes cultural sensitivity annotations classifying questions as Culturally Sensitive (CS) or Culturally Agnostic (CA).

Published: 2025

Score Range: 0-100

Top Score: 88.6

Technical Paper

Global-MMLU Leaderboard

Rank	Model	Provider	Score	Parameters	Released	Type
1	Gemini 2.5 Pro	Google	88.6		2025-05-06	Multimodal
2	Gemma 3	Google	75.4	1B, 4B, 12B, 27B	2025-03-12	Multimodal

About Global-MMLU

Methodology

Global-MMLU evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.

Publication

This benchmark was published in 2025.Technical Paper

Related Benchmarks

Global-MMLU-Lite

knowledge

A balanced collection of culturally sensitive and culturally agnostic MMLU tasks designed for efficient evaluation of multilingual models in 15 languages (including English).

Published2025

Scale0-100

Technical Paper View Details

MMLU-Pro

knowledge

MMLU-Pro is an enhanced benchmark with over 12,000 challenging questions across 14 domains including Biology, Business, Chemistry, Computer Science, Economics, Engineering, Health, History, Law, Math, Philosophy, Physics, Psychology, and Others. It features 10 answer choices per question (vs. 4 in MMLU) and focuses on complex reasoning tasks.

Published2025

Scale0-100

Technical Paper View Details

MMLU

knowledge

Massive Multitask Language Understanding (MMLU) tests knowledge across 57 subjects including mathematics, history, law, and more.

Published2020

Scale0-100

Technical Paper View Details