GPQA

reasoningPending Human Review

Graduate-level Problems in Quantitative Analysis (GPQA) evaluates advanced reasoning on graduate-level physics and mathematics problems.

Published: 2023
Score Range: 0-100
Top Score: 91.9

GPQA Leaderboard

RankModelProviderScoreParametersReleasedType
1Gemini 3 ProGoogle
91.9
Proprietary2025-11-18Multimodal
2Claude Opus 4.6Anthropic
91.3
Unreleased2026-02-05Multimodal
3Grok 4xAI
87.5
Unknown2025-07-09Multimodal
4Claude 3.7 SonnetAnthropic
84.8
2025-02-24Multimodal
5Grok 3 MinixAI
84
Unknown2025-02-19Multimodal
6o3OpenAI
83.3
2025-04-16Multimodal
7Gemini 2.5 ProGoogle
83
2025-05-06Multimodal
8Gemini 2.5 FlashGoogle
82.8
2025-05-20Multimodal
9o4-miniOpenAI
81.4
2025-04-16Multimodal
10GPT-OSS-120BOpenAI
80.1
117B total (5.1B active per token)2025-08-05Text

About GPQA

Methodology

GPQA evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.

Publication

This benchmark was published in 2023.Technical Paper

Related Benchmarks