GPQA
reasoning
Graduate-level Problems in Quantitative Analysis (GPQA) evaluates advanced reasoning on graduate-level physics and mathematics problems.
Published: 2023
Scale: 0-100
Top Score: 84.8
GPQA Leaderboard
Rank | Model | Provider | Score | Parameters | Released | Type |
---|---|---|---|---|---|---|
1 | Claude 3.7 Sonnet | Anthropic | 84.8 | 2025-02-24 | Multimodal | |
2 | o3 | OpenAI | 83.3 | 2025-04-16 | Multimodal | |
3 | Gemini 2.5 Pro | 83 | 2025-05-06 | Multimodal | ||
4 | Gemini 2.5 Flash | 82.8 | 2025-05-20 | Multimodal | ||
5 | o4-mini | OpenAI | 81.4 | 2025-04-16 | Multimodal | |
6 | o1 | OpenAI | 78 | 2024-09-12 | Multimodal | |
7 | Claude Opus 4 | Anthropic | 74.9 | 2025-05-22 | Multimodal | |
8 | o1-preview | OpenAI | 73.3 | 2024-09-12 | Text | |
9 | DeepSeek-R1 | DeepSeek | 71.5 | 671B (37B activated) | 2025-01-20 | Text |
10 | GPT-4.5 | OpenAI | 71.4 | 2025-02-27 | Multimodal |
About GPQA
Description
Graduate-level Problems in Quantitative Analysis (GPQA) evaluates advanced reasoning on graduate-level physics and mathematics problems.
Methodology
GPQA evaluates models on a scale of 0 to 100. Higher scores indicate better performance. For detailed information about the methodology, please refer to the original paper.
Publication
This benchmark was published in 2023.Read the full paper