SWE-bench

coding

Software Engineering Benchmark (SWE-bench) evaluates models on real-world software engineering tasks.

Published: 2023
Scale: 0-100
Top Score: 72.7

SWE-bench Leaderboard

RankModelProviderScoreParametersReleasedType
1Claude Sonnet 4Anthropic
72.7
2025-05-22Multimodal
2Claude Opus 4Anthropic
72.5
2025-05-22Multimodal
3Claude 3.7 SonnetAnthropic
70.3
2025-02-24Multimodal
4o3OpenAI
69.1
2025-04-16Multimodal
5o4-miniOpenAI
68.1
2025-04-16Multimodal
6Gemini 2.5 ProGoogle
63.2
2025-05-06Multimodal
7Gemini 2.5 FlashGoogle
60.4
2025-05-20Multimodal
8DeepSeek-R1DeepSeek
49.2
671B (37B activated)2025-01-20Text
9Claude 3.5 HaikuAnthropic
40.6
2024-10-22Multimodal
10GPT-4.5OpenAI
38
2025-02-27Multimodal

About SWE-bench

Description

Software Engineering Benchmark (SWE-bench) evaluates models on real-world software engineering tasks.

Methodology

SWE-bench evaluates models on a scale of 0 to 100. Higher scores indicate better performance. For detailed information about the methodology, please refer to the original paper.

Publication

This benchmark was published in 2023.Read the full paper

Related Benchmarks