Michelangelo Long-Context Reasoning (128k)
MRCR (Multi-Round Coreference Resolution) is part of the Michelangelo benchmark suite that evaluates long-context reasoning in LLMs. It tests models' ability to track identities and references in adversarial conversation histories up to 1M tokens. Unlike retrieval-based benchmarks, MRCR requires synthesis, reasoning, and contextual understanding over extended contexts. The benchmark is synthetically generated to avoid pretraining contamination and provides unambiguous scoring metrics. MRCR is recommended as a superior alternative to traditional retrieval benchmarks for evaluating long-context reasoning capabilities.
Michelangelo Long-Context Reasoning (128k) Leaderboard
Rank | Model | Provider | Score | Parameters | Released | Type |
---|---|---|---|---|---|---|
1 | Gemini 2.5 Flash | 74 | 2025-05-20 | Multimodal | ||
2 | Gemini 2.5 Flash-Lite | 30.6 | 2025-06-17 | Multimodal |
About Michelangelo Long-Context Reasoning (128k)
Methodology
Michelangelo Long-Context Reasoning (128k) evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.
Publication
This benchmark was published in 2024.Technical Paper