LiveCodeBench v6
codingPending Human Review
Benchmark for evaluating LLMs on code generation tasks from contests.
Published: 2024
Score Range: 0-100
Top Score: 90.7
LiveCodeBench v6 Leaderboard
| Rank | Model | Provider | Score | Parameters | Released | Type |
|---|---|---|---|---|---|---|
| 1 | Gemini 3 Pro | 90.7 | Proprietary | 2025-11-18 | Multimodal | |
| 2 | Kimi K2 | Moonshot AI | 83.1 | 1T total, 32B activated | 2025-07-11 | Text |
| 3 | DeepSeek-V3 | DeepSeek | 46.9 | 671B total, 37B activated | 2024-12-26 | Text |
| 4 | Claude Opus 4 | Anthropic | 44.7 | 2025-05-22 | Multimodal | |
| 5 | Gemini 2.5 Flash | 44.7 | 2025-05-20 | Multimodal | ||
| 6 | GPT-4.1 | OpenAI | 44.7 | 2025-04-14 | Multimodal | |
| 7 | Qwen-3 | Alibaba | 37 | 235B (22B active) | 2025-04-29 | Text |
About LiveCodeBench v6
Methodology
LiveCodeBench v6 evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.
Publication
This benchmark was published in 2024.Website