Michelangelo Long-Context Reasoning (128k)

reasoningVerified

MRCR (Multi-Round Coreference Resolution) is part of the Michelangelo benchmark suite that evaluates long-context reasoning in LLMs. It tests models' ability to track identities and references in adversarial conversation histories up to 1M tokens. Unlike retrieval-based benchmarks, MRCR requires synthesis, reasoning, and contextual understanding over extended contexts. The benchmark is synthetically generated to avoid pretraining contamination and provides unambiguous scoring metrics. MRCR is recommended as a superior alternative to traditional retrieval benchmarks for evaluating long-context reasoning capabilities.

Published: 2024
Score Range: 0-100
Top Score: 74

Michelangelo Long-Context Reasoning (128k) Leaderboard

RankModelProviderScoreParametersReleasedType
1Gemini 2.5 FlashGoogle
74
2025-05-20Multimodal
2Gemini 2.5 Flash-LiteGoogle
30.6
2025-06-17Multimodal

About Michelangelo Long-Context Reasoning (128k)

Methodology

Michelangelo Long-Context Reasoning (128k) evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.

Publication

This benchmark was published in 2024.Technical Paper

Related Benchmarks