AIME-2025
mathematicsPending Human Review
American Invitational Mathematics Examination (AIME) 2025 problems.
Published: 2025
Score Range: 0-100
Top Score: 99.2
AIME-2025 Leaderboard
| Rank | Model | Provider | Score | Parameters | Released | Type |
|---|---|---|---|---|---|---|
| 1 | Nemotron 3 Nano | NVIDIA | 99.2 | 31.6B (Total), ~3.2B (Active) | 2025-12-15 | Text |
| 2 | GPT-OSS-20B | OpenAI | 98.7 | 21B total (3.6B active per token) | 2025-08-05 | Text |
| 3 | GPT-OSS-120B | OpenAI | 97.9 | 117B total (5.1B active per token) | 2025-08-05 | Text |
| 4 | GLM-4.7 | Z.ai | 95.7 | Unreleased | 2025-12-22 | Text |
| 5 | Gemini 3 Pro | 95 | Proprietary | 2025-11-18 | Multimodal | |
| 6 | Kimi K2 | Moonshot AI | 94.5 | 1T total, 32B activated | 2025-07-11 | Text |
| 7 | Grok 3 | xAI | 93.3 | Unknown (multi-trillion estimated) | 2025-02-19 | Multimodal |
| 8 | o4-mini | OpenAI | 92.7 | 2025-04-16 | Multimodal | |
| 9 | Grok 4 | xAI | 91.7 | Unknown | 2025-07-09 | Multimodal |
| 10 | Grok 3 Mini | xAI | 90.8 | Unknown | 2025-02-19 | Multimodal |
About AIME-2025
Methodology
AIME-2025 evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.
Publication
This benchmark was published in 2025.Technical Paper