GSM8K
mathematicsPending Verification
Grade School Math 8K (GSM8K) consists of 8.5K high-quality grade school math word problems.
Published: 2021
Score Range: 0-100
Top Score: 96.4
GSM8K Leaderboard
Rank | Model | Provider | Score | Parameters | Released | Type |
---|---|---|---|---|---|---|
1 | Claude 3.5 Sonnet | Anthropic | 96.4 | 2024-06-20 | Multimodal | |
2 | Kimi K2 | Moonshot AI | 95 | 1T total, 32B activated | 2025-07-11 | Text |
3 | Claude 3 Opus | Anthropic | 95 | 2024-03-04 | Multimodal | |
4 | Claude 3 Sonnet | Anthropic | 92.3 | 2024-03-04 | Multimodal | |
5 | Qwen-2 | Alibaba | 89.5 | 72B | 2024-06-11 | Text |
6 | DeepSeek-V3 | DeepSeek | 89.3 | 671B total, 37B activated | 2024-12-26 | Text |
7 | Claude 3 Haiku | Anthropic | 88.9 | 2024-03-04 | Multimodal | |
8 | Mixtral 8×22B | Mistral AI | 88 | 141B (39B active) | 2024-04-17 | Text |
9 | Claude 2 | Anthropic | 88 | ~130B | 2023-07-11 | Text |
10 | Gemma 3 | 78.5 | 1B, 4B, 12B, 27B | 2025-03-12 | Multimodal |
About GSM8K
Methodology
GSM8K evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.
Publication
This benchmark was published in 2021.Read the full paper