MATH

mathematics

A dataset of 12,500 challenging competition mathematics problems requiring multi-step reasoning.

Published: 2021
Scale: 0-100
Top Score: 96.2

MATH Leaderboard

RankModelProviderScoreParametersReleasedType
1Claude 3.7 SonnetAnthropic
96.2
2025-02-24Multimodal
2o1OpenAI
94.8
2024-09-12Multimodal
3Gemini 2.0 ProGoogle
91.8
2025-02-05Multimodal
4Gemini 2.0 FlashGoogle
90.9
2025-02-25Multimodal
5DeepSeek-V3DeepSeek
90.2
671B total, 37B activated2024-12-26Text
6Gemini 2.0 Flash-LiteGoogle
86.8
2025-02-25Multimodal
7GPT-4oOpenAI
76.6
2024-05-13Multimodal
8Claude 3.5 SonnetAnthropic
71.1
2024-06-20Multimodal
9Claude 3.5 HaikuAnthropic
69.4
2024-10-22Multimodal
10Claude 3 OpusAnthropic
60.1
2024-03-04Multimodal

About MATH

Description

A dataset of 12,500 challenging competition mathematics problems requiring multi-step reasoning.

Methodology

MATH evaluates models on a scale of 0 to 100. Higher scores indicate better performance. For detailed information about the methodology, please refer to the original paper.

Publication

This benchmark was published in 2021.Read the full paper

Related Benchmarks