DROP

reasoningPending Verification

Discrete Reasoning Over Paragraphs (DROP) requires models to resolve references in a passage and perform discrete operations.

Published: 2019
Score Range: 0-100
Top Score: 92.2

DROP Leaderboard

RankModelProviderScoreParametersReleasedType
1DeepSeek-R1DeepSeek
92.2
671B (37B activated)2025-01-20Text
2DeepSeek-V3DeepSeek
91.6
671B total, 37B activated2024-12-26Text
3Claude 3.5 SonnetAnthropic
87.1
2024-06-20Multimodal
4GPT-4oOpenAI
83.4
2024-05-13Multimodal
5Claude 3.5 HaikuAnthropic
83.1
2024-10-22Multimodal
6Claude 3 OpusAnthropic
83.1
2024-03-04Multimodal
7Claude 3 SonnetAnthropic
78.9
2024-03-04Multimodal
8Claude 3 HaikuAnthropic
78.4
2024-03-04Multimodal
9Gemma 3Google
71.2
1B, 4B, 12B, 27B2025-03-12Multimodal

About DROP

Methodology

DROP evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.

Publication

This benchmark was published in 2019.Technical Paper

Related Benchmarks