GSM8K

mathematicsPending Human Review

Grade School Math 8K (GSM8K) consists of 8.5K high-quality grade school math word problems.

Published: 2021
Score Range: 0-100
Top Score: 96.4

GSM8K Leaderboard

RankModelProviderScoreParametersReleasedType
1Claude 3.5 SonnetAnthropic
96.4
2024-06-20Multimodal
2Kimi K2Moonshot AI
95
1T total, 32B activated2025-07-11Text
3Claude 3 OpusAnthropic
95
2024-03-04Multimodal
4Nemotron 3 NanoNVIDIA
92.34
31.6B (Total), ~3.2B (Active)2025-12-15Text
5Claude 3 SonnetAnthropic
92.3
2024-03-04Multimodal
6Qwen-2Alibaba
89.5
72B2024-06-11Text
7DeepSeek-V3DeepSeek
89.3
671B total, 37B activated2024-12-26Text
8Claude 3 HaikuAnthropic
88.9
2024-03-04Multimodal
9Mixtral 8×22BMistral AI
88
141B (39B active)2024-04-17Text
10Claude 2Anthropic
88
~130B2023-07-11Text

About GSM8K

Methodology

GSM8K evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.

Publication

This benchmark was published in 2021.Technical Paper

Related Benchmarks