MATH

mathematicsPending Verification

A dataset of 12,500 challenging competition mathematics problems requiring multi-step reasoning.

Published: 2021
Score Range: 0-100
Top Score: 97.4

MATH Leaderboard

RankModelProviderScoreParametersReleasedType
1Kimi K2Moonshot AI
97.4
1T total, 32B activated2025-07-11Text
2Claude 3.7 SonnetAnthropic
96.2
2025-02-24Multimodal
3o1OpenAI
94.8
2024-09-12Multimodal
4Gemini 2.0 ProGoogle
91.8
2025-02-05Multimodal
5Gemini 2.0 FlashGoogle
90.9
2025-02-25Multimodal
6DeepSeek-V3DeepSeek
90.2
671B total, 37B activated2024-12-26Text
7Gemini 2.0 Flash-LiteGoogle
86.8
2025-02-25Multimodal
8GPT-4oOpenAI
76.6
2024-05-13Multimodal
9Grok-2xAI
76.1
Unknown2024-08-13Multimodal
10Grok-2 minixAI
73
Unknown2024-08-13Multimodal

About MATH

Methodology

MATH evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.

Publication

This benchmark was published in 2021.Technical Paper

Related Benchmarks