Global-MMLU-Lite

knowledgeVerified

A balanced collection of culturally sensitive and culturally agnostic MMLU tasks designed for efficient evaluation of multilingual models in 15 languages (including English).

Published: 2025

Score Range: 0-100

Top Score: 86.5

Technical Paper

Global-MMLU-Lite Leaderboard

Rank	Model	Provider	Score	Parameters	Released	Type
1	Gemini 2.0 Pro	Google	86.5		2025-02-05	Multimodal
2	Gemini 2.5 Flash-Lite	Google	84.5		2025-06-17	Multimodal
3	Gemini 2.0 Flash	Google	83.4		2025-02-25	Multimodal
4	Gemini 2.0 Flash-Lite	Google	78.2		2025-02-25	Multimodal
5	Nemotron 3 Nano	NVIDIA	74.47	31.6B (Total), ~3.2B (Active)	2025-12-15	Text

About Global-MMLU-Lite

Methodology

Global-MMLU-Lite evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.

Publication

This benchmark was published in 2025.Technical Paper

Related Benchmarks

Global-MMLU

knowledge

A multilingual evaluation set spanning 42 languages that combines machine translations for MMLU questions along with professional translations and crowd-sourced post-edits. Includes cultural sensitivity annotations classifying questions as Culturally Sensitive (CS) or Culturally Agnostic (CA).

Published2025

Scale0-100

Technical Paper View Details

MMLU-Pro

knowledge

MMLU-Pro is an enhanced benchmark with over 12,000 challenging questions across 14 domains including Biology, Business, Chemistry, Computer Science, Economics, Engineering, Health, History, Law, Math, Philosophy, Physics, Psychology, and Others. It features 10 answer choices per question (vs. 4 in MMLU) and focuses on complex reasoning tasks.

Published2025

Scale0-100

Technical Paper View Details

MMLU

knowledge

Massive Multitask Language Understanding (MMLU) tests knowledge across 57 subjects including mathematics, history, law, and more.

Published2020

Scale0-100

Technical Paper View Details