GSM8K
mathematicsPending Verification
Grade School Math 8K (GSM8K) consists of 8.5K high-quality grade school math word problems.
Published: 2021
Score Range: 0-100
Top Score: 96.4
GSM8K Leaderboard
Rank | Model | Provider | Score | Parameters | Released | Type |
---|---|---|---|---|---|---|
1 | Claude 3.5 Sonnet | Anthropic | 96.4 | 2024-06-20 | Multimodal | |
2 | Kimi K2 | Moonshot AI | 95 | 1T total, 32B activated | 2025-07-11 | Text |
3 | Claude 3 Opus | Anthropic | 95 | 2024-03-04 | Multimodal | |
4 | Claude 3 Sonnet | Anthropic | 92.3 | 2024-03-04 | Multimodal | |
5 | Qwen-2 | Alibaba | 89.5 | 72B | 2024-06-11 | Text |
6 | DeepSeek-V3 | DeepSeek | 89.3 | 671B total, 37B activated | 2024-12-26 | Text |
7 | Claude 3 Haiku | Anthropic | 88.9 | 2024-03-04 | Multimodal | |
8 | Mixtral 8×22B | Mistral AI | 88 | 141B (39B active) | 2024-04-17 | Text |
9 | Claude 2 | Anthropic | 88 | ~130B | 2023-07-11 | Text |
10 | Gemma 3 | 78.5 | 1B, 4B, 12B, 27B | 2025-03-12 | Multimodal |
About GSM8K
Methodology
GSM8K evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.
Publication
This benchmark was published in 2021.Technical Paper