SimpleQA

knowledgePending Verification

A benchmark of simple but precise questions to test factual knowledge and reasoning.

Published: 2024
Score Range: 0-100
Top Score: 50.8

SimpleQA Leaderboard

RankModelProviderScoreParametersReleasedType
1Gemini 2.5 ProGoogle
50.8
2025-05-06Multimodal
2Gemini 2.0 ProGoogle
44.3
2025-02-05Multimodal
3Grok 3xAI
43.6
Unknown (multi-trillion estimated)2025-02-19Multimodal
4Kimi K2Moonshot AI
31
1T total, 32B activated2025-07-11Text
5DeepSeek-R1DeepSeek
30.1
671B (37B activated)2025-01-20Text
6Gemini 2.0 FlashGoogle
29.9
2025-02-25Multimodal
7Gemini 2.5 FlashGoogle
26.9
2025-05-20Multimodal
8DeepSeek-V3DeepSeek
24.9
671B total, 37B activated2024-12-26Text
9Gemini 2.0 Flash-LiteGoogle
21.7
2025-02-25Multimodal
10Grok 3 MinixAI
21.7
Unknown2025-02-19Multimodal

About SimpleQA

Methodology

SimpleQA evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.

Publication

This benchmark was published in 2024.Technical Paper

Related Benchmarks