DROP

reasoning

Discrete Reasoning Over Paragraphs (DROP) requires models to resolve references in a passage and perform discrete operations.

Published: 2019
Scale: 0-100
Top Score: 92.2

DROP Leaderboard

RankModelProviderScoreParametersReleasedType
1DeepSeek-R1DeepSeek
92.2
671B (37B activated)2025-01-20Text
2DeepSeek-V3DeepSeek
91.6
671B total, 37B activated2024-12-26Text
3Claude 3.5 SonnetAnthropic
87.1
2024-06-20Multimodal
4GPT-4oOpenAI
83.4
2024-05-13Multimodal
5Claude 3.5 HaikuAnthropic
83.1
2024-10-22Multimodal
6Claude 3 OpusAnthropic
83.1
2024-03-04Multimodal
7Claude 3 SonnetAnthropic
78.9
2024-03-04Multimodal
8Claude 3 HaikuAnthropic
78.4
2024-03-04Multimodal
9Gemma 3Google
71.2
1B, 4B, 12B, 27B2025-03-12Multimodal

About DROP

Description

Discrete Reasoning Over Paragraphs (DROP) requires models to resolve references in a passage and perform discrete operations.

Methodology

DROP evaluates models on a scale of 0 to 100. Higher scores indicate better performance. For detailed information about the methodology, please refer to the original paper.

Publication

This benchmark was published in 2019.Read the full paper

Related Benchmarks