IMO-AnswerBench
A large-scale benchmark of 400 International Mathematical Olympiad-level problems with verifiable answers, spanning Algebra, Combinatorics, Geometry, and Number Theory across four difficulty levels.
IMO-AnswerBench Leaderboard
| Rank | Model | Provider | Score | Parameters | Released | Type |
|---|---|---|---|---|---|---|
| 1 | Gemini 3 Pro | 83.3 | Proprietary | 2025-11-18 | Multimodal | |
| 2 | Kimi K2 | Moonshot AI | 78.6 | 1T total, 32B activated | 2025-07-11 | Text |
About IMO-AnswerBench
Description
IMO-AnswerBench is part of the IMO-Bench suite introduced at EMNLP 2025 by Google DeepMind. It consists of 400 carefully chosen problems from past Olympiad competitions, altered by IMO medalists and mathematicians to avoid memorization. ## Problem Distribution The benchmark spans four IMO categories: - **Algebra**: 100 problems (11 pre-IMO, 46 IMO-Easy, 32 IMO-Medium, 11 IMO-Hard) - **Combinatorics**: 100 problems (4 pre-IMO, 19 IMO-Easy, 31 IMO-Medium, 46 IMO-Hard) - **Geometry**: 100 problems (13 pre-IMO, 44 IMO-Easy, 32 IMO-Medium, 11 IMO-Hard) - **Number Theory**: 100 problems (2 pre-IMO, 20 IMO-Easy, 31 IMO-Medium, 47 IMO-Hard) ## Difficulty Levels - **Pre-IMO**: Middle school or pre-Math Olympiad problems - **IMO-Easy**: Equivalent to Problem 1 or Problem 4 at the IMO - **IMO-Medium**: Equivalent to Problem 2 or Problem 5 at the IMO - **IMO-Hard**: Equivalent to Problem 3 or Problem 6 at the IMO or post-Math Olympiad problems ## Key Features - Vetted by a panel of IMO medalists and mathematicians (10 gold and 5 silver medals combined) - Problems require rigorous multi-step reasoning and creativity beyond simple formula application - Diverse representation of topics, ideas, and domain knowledge across all categories - Designed to push the frontiers of mathematical reasoning in AI systems - Part of the benchmark suite that contributed to achieving gold-medal standard at IMO IMO-AnswerBench focuses on getting the right answer and serves as a comprehensive test of mathematical problem-solving abilities at the International Mathematical Olympiad level.
Methodology
IMO-AnswerBench evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.
Publication
This benchmark was published in 2025.Website