GPQA
reasoningPending Verification
Graduate-level Problems in Quantitative Analysis (GPQA) evaluates advanced reasoning on graduate-level physics and mathematics problems.
Published: 2023
Score Range: 0-100
Top Score: 87.5
GPQA Leaderboard
Rank | Model | Provider | Score | Parameters | Released | Type |
---|---|---|---|---|---|---|
1 | Grok 4 | xAI | 87.5 | Unknown | 2025-07-09 | Multimodal |
2 | Claude 3.7 Sonnet | Anthropic | 84.8 | 2025-02-24 | Multimodal | |
3 | Grok 3 Mini | xAI | 84 | Unknown | 2025-02-19 | Multimodal |
4 | o3 | OpenAI | 83.3 | 2025-04-16 | Multimodal | |
5 | Gemini 2.5 Pro | 83 | 2025-05-06 | Multimodal | ||
6 | Gemini 2.5 Flash | 82.8 | 2025-05-20 | Multimodal | ||
7 | o4-mini | OpenAI | 81.4 | 2025-04-16 | Multimodal | |
8 | GPT-OSS-120B | OpenAI | 80.1 | 117B total (5.1B active per token) | 2025-08-05 | Text |
9 | o1 | OpenAI | 78 | 2024-09-12 | Multimodal | |
10 | Grok 3 | xAI | 75.4 | Unknown (multi-trillion estimated) | 2025-02-19 | Multimodal |
About GPQA
Methodology
GPQA evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.
Publication
This benchmark was published in 2023.Technical Paper