GPQA

reasoningPending Human Review

Graduate-level Problems in Quantitative Analysis (GPQA) evaluates advanced reasoning on graduate-level physics and mathematics problems.

Published: 2023
Score Range: 0-100
Top Score: 91.9

GPQA Leaderboard

RankModelProviderScoreParametersReleasedType
1Gemini 3 ProGoogle
91.9
Proprietary2025-11-18Multimodal
2Grok 4xAI
87.5
Unknown2025-07-09Multimodal
3Claude 3.7 SonnetAnthropic
84.8
2025-02-24Multimodal
4Grok 3 MinixAI
84
Unknown2025-02-19Multimodal
5o3OpenAI
83.3
2025-04-16Multimodal
6Gemini 2.5 ProGoogle
83
2025-05-06Multimodal
7Gemini 2.5 FlashGoogle
82.8
2025-05-20Multimodal
8o4-miniOpenAI
81.4
2025-04-16Multimodal
9GPT-OSS-120BOpenAI
80.1
117B total (5.1B active per token)2025-08-05Text
10o1OpenAI
78
2024-09-12Multimodal

About GPQA

Methodology

GPQA evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.

Publication

This benchmark was published in 2023.Technical Paper

Related Benchmarks