LiveCodeBench v6

codingPending Human Review

Benchmark for evaluating LLMs on code generation tasks from contests.

Published: 2024
Score Range: 0-100
Top Score: 90.7

LiveCodeBench v6 Leaderboard

RankModelProviderScoreParametersReleasedType
1Gemini 3 ProGoogle
90.7
Proprietary2025-11-18Multimodal
2Kimi K2Moonshot AI
83.1
1T total, 32B activated2025-07-11Text
3DeepSeek-V3DeepSeek
46.9
671B total, 37B activated2024-12-26Text
4Claude Opus 4Anthropic
44.7
2025-05-22Multimodal
5Gemini 2.5 FlashGoogle
44.7
2025-05-20Multimodal
6GPT-4.1OpenAI
44.7
2025-04-14Multimodal
7Qwen-3Alibaba
37
235B (22B active)2025-04-29Text

About LiveCodeBench v6

Methodology

LiveCodeBench v6 evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.

Publication

This benchmark was published in 2024.Website

Related Benchmarks