WinoGrande
reasoningPending Human Review
An adversarial winograd schema challenge at scale.
Published: 2019
Score Range: 0-100
Top Score: 84.9
WinoGrande Leaderboard
| Rank | Model | Provider | Score | Parameters | Released | Type |
|---|---|---|---|---|---|---|
| 1 | DeepSeek-V3 | DeepSeek | 84.9 | 671B total, 37B activated | 2024-12-26 | Text |
About WinoGrande
Methodology
WinoGrande evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.
Publication
This benchmark was published in 2019.Website