GPQA

reasoning

Graduate-level Problems in Quantitative Analysis (GPQA) evaluates advanced reasoning on graduate-level physics and mathematics problems.

Published: 2023
Scale: 0-100
Top Score: 84.8

GPQA Leaderboard

RankModelProviderScoreParametersReleasedType
1Claude 3.7 SonnetAnthropic
84.8
2025-02-24Multimodal
2o3OpenAI
83.3
2025-04-16Multimodal
3Gemini 2.5 ProGoogle
83
2025-05-06Multimodal
4Gemini 2.5 FlashGoogle
82.8
2025-05-20Multimodal
5o4-miniOpenAI
81.4
2025-04-16Multimodal
6o1OpenAI
78
2024-09-12Multimodal
7Claude Opus 4Anthropic
74.9
2025-05-22Multimodal
8o1-previewOpenAI
73.3
2024-09-12Text
9DeepSeek-R1DeepSeek
71.5
671B (37B activated)2025-01-20Text
10GPT-4.5OpenAI
71.4
2025-02-27Multimodal

About GPQA

Description

Graduate-level Problems in Quantitative Analysis (GPQA) evaluates advanced reasoning on graduate-level physics and mathematics problems.

Methodology

GPQA evaluates models on a scale of 0 to 100. Higher scores indicate better performance. For detailed information about the methodology, please refer to the original paper.

Publication

This benchmark was published in 2023.Read the full paper

Related Benchmarks