GPQA

reasoningPending Verification

Graduate-level Problems in Quantitative Analysis (GPQA) evaluates advanced reasoning on graduate-level physics and mathematics problems.

Published: 2023
Score Range: 0-100
Top Score: 87.5

GPQA Leaderboard

RankModelProviderScoreParametersReleasedType
1Grok 4xAI
87.5
Unknown2025-07-09Multimodal
2Claude 3.7 SonnetAnthropic
84.8
2025-02-24Multimodal
3Grok 3 MinixAI
84
Unknown2025-02-19Multimodal
4o3OpenAI
83.3
2025-04-16Multimodal
5Gemini 2.5 ProGoogle
83
2025-05-06Multimodal
6Gemini 2.5 FlashGoogle
82.8
2025-05-20Multimodal
7o4-miniOpenAI
81.4
2025-04-16Multimodal
8GPT-OSS-120BOpenAI
80.1
117B total (5.1B active per token)2025-08-05Text
9o1OpenAI
78
2024-09-12Multimodal
10Grok 3xAI
75.4
Unknown (multi-trillion estimated)2025-02-19Multimodal

About GPQA

Methodology

GPQA evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.

Publication

This benchmark was published in 2023.Technical Paper

Related Benchmarks