Aider-Polyglot

coding

Tests models on their ability to write code in multiple programming languages.

Published: 2024
Scale: 0-100
Top Score: 81.3

Aider-Polyglot Leaderboard

RankModelProviderScoreParametersReleasedType
1o3OpenAI
81.3
2025-04-16Multimodal
2Gemini 2.5 ProGoogle
76.5
2025-05-06Multimodal
3o4-miniOpenAI
68.9
2025-04-16Multimodal
4Gemini 2.5 FlashGoogle
61.9
2025-05-20Multimodal
5Qwen-3Alibaba
61.8
235B (22B active)2025-04-29Text
6DeepSeek-R1DeepSeek
53.3
671B (37B activated)2025-01-20Text
7DeepSeek-V3DeepSeek
49.6
671B total, 37B activated2024-12-26Text

About Aider-Polyglot

Description

Tests models on their ability to write code in multiple programming languages.

Methodology

Aider-Polyglot evaluates models on a scale of 0 to 100. Higher scores indicate better performance. For detailed information about the methodology, please refer to the original paper.

Publication

This benchmark was published in 2024.Read the full paper

Related Benchmarks