Michelangelo Long-Context Reasoning (128k)

reasoningVerified

MRCR (Multi-Round Coreference Resolution) is part of the Michelangelo benchmark suite that evaluates long-context reasoning in LLMs. It tests models' ability to track identities and references in adversarial conversation histories up to 1M tokens. Unlike retrieval-based benchmarks, MRCR requires synthesis, reasoning, and contextual understanding over extended contexts. The benchmark is synthetically generated to avoid pretraining contamination and provides unambiguous scoring metrics. MRCR is recommended as a superior alternative to traditional retrieval benchmarks for evaluating long-context reasoning capabilities.

Published: 2024

Score Range: 0-100

Top Score: 74

Technical Paper

Michelangelo Long-Context Reasoning (128k) Leaderboard

Rank	Model	Provider	Score	Parameters	Released	Type
1	Gemini 2.5 Flash	Google	74		2025-05-20	Multimodal
2	Gemini 2.5 Flash-Lite	Google	30.6		2025-06-17	Multimodal

About Michelangelo Long-Context Reasoning (128k)

Methodology

Michelangelo Long-Context Reasoning (128k) evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.