DROP
reasoning
Discrete Reasoning Over Paragraphs (DROP) requires models to resolve references in a passage and perform discrete operations.
Published: 2019
Scale: 0-100
Top Score: 92.2
DROP Leaderboard
Rank | Model | Provider | Score | Parameters | Released | Type |
---|---|---|---|---|---|---|
1 | DeepSeek-R1 | DeepSeek | 92.2 | 671B (37B activated) | 2025-01-20 | Text |
2 | DeepSeek-V3 | DeepSeek | 91.6 | 671B total, 37B activated | 2024-12-26 | Text |
3 | Claude 3.5 Sonnet | Anthropic | 87.1 | 2024-06-20 | Multimodal | |
4 | GPT-4o | OpenAI | 83.4 | 2024-05-13 | Multimodal | |
5 | Claude 3.5 Haiku | Anthropic | 83.1 | 2024-10-22 | Multimodal | |
6 | Claude 3 Opus | Anthropic | 83.1 | 2024-03-04 | Multimodal | |
7 | Claude 3 Sonnet | Anthropic | 78.9 | 2024-03-04 | Multimodal | |
8 | Claude 3 Haiku | Anthropic | 78.4 | 2024-03-04 | Multimodal | |
9 | Gemma 3 | 71.2 | 1B, 4B, 12B, 27B | 2025-03-12 | Multimodal |
About DROP
Description
Discrete Reasoning Over Paragraphs (DROP) requires models to resolve references in a passage and perform discrete operations.
Methodology
DROP evaluates models on a scale of 0 to 100. Higher scores indicate better performance. For detailed information about the methodology, please refer to the original paper.
Publication
This benchmark was published in 2019.Read the full paper