SimpleQA
knowledgePending Verification
A benchmark of simple but precise questions to test factual knowledge and reasoning.
Published: 2024
Score Range: 0-100
Top Score: 50.8
SimpleQA Leaderboard
Rank | Model | Provider | Score | Parameters | Released | Type |
---|---|---|---|---|---|---|
1 | Gemini 2.5 Pro | 50.8 | 2025-05-06 | Multimodal | ||
2 | Gemini 2.0 Pro | 44.3 | 2025-02-05 | Multimodal | ||
3 | Grok 3 | xAI | 43.6 | Unknown (multi-trillion estimated) | 2025-02-19 | Multimodal |
4 | Kimi K2 | Moonshot AI | 31 | 1T total, 32B activated | 2025-07-11 | Text |
5 | DeepSeek-R1 | DeepSeek | 30.1 | 671B (37B activated) | 2025-01-20 | Text |
6 | Gemini 2.0 Flash | 29.9 | 2025-02-25 | Multimodal | ||
7 | Gemini 2.5 Flash | 26.9 | 2025-05-20 | Multimodal | ||
8 | DeepSeek-V3 | DeepSeek | 24.9 | 671B total, 37B activated | 2024-12-26 | Text |
9 | Gemini 2.0 Flash-Lite | 21.7 | 2025-02-25 | Multimodal | ||
10 | Grok 3 Mini | xAI | 21.7 | Unknown | 2025-02-19 | Multimodal |
About SimpleQA
Methodology
SimpleQA evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.
Publication
This benchmark was published in 2024.Technical Paper