Evaluates models on their ability to solve coding problems in real-time.

LiveCodeBench-v5

xAI's latest generation model with enhanced mathematical reasoning capabilities, showing significant improvements in competition-level mathematics benchmarks. Features 2x faster end-to-end latency, supports 5 different voices, and achieves 10x daily user seconds compared to previous models.

Grok 4

Gemini 2.5 Pro is capable of reasoning through its thoughts before responding, resulting in enhanced performance and improved accuracy. Features Deep Think, an enhanced reasoning mode, and native audio outputs that capture subtle nuances of speech.

Gemini 2.5 Pro

Third-generation Qwen model featuring hybrid reasoning capabilities that can switch between thinking and non-thinking modes. Trained on 36 trillion tokens (double that of Qwen2.5), with support for 119 languages and dialects. Available in 6 dense models (0.6B to 32B parameters) and 2 MoE models (30B/3B active and 235B/22B active).

Qwen-3

Improved across key benchmarks for reasoning, multimodality, code and long context while getting even more efficient. Best for fast performance on complex tasks.

Gemini 2.5 Flash

xAI's most advanced model yet, blending superior reasoning with extensive pretraining knowledge. Trained on the Colossus supercluster with 10x the compute of previous state-of-the-art models. Features test-time compute and reasoning capabilities through reinforcement learning, allowing it to think for seconds to minutes while correcting errors and exploring alternatives. Achieved an Elo score of 1402 in the Chatbot Arena.

Grok 3

A smaller, more efficient version of Grok 3 from xAI. Represents a new frontier in cost-efficient reasoning, particularly strong on STEM tasks that don't require as much world knowledge. Also features test-time compute and reasoning capabilities through the Grok 3 mini (Think) variant, achieving impressive performance on mathematical and coding benchmarks while being more resource-efficient than the full Grok 3 model.

Grok 3 Mini

Google's best model yet for coding performance and complex prompts, with better understanding and reasoning of world knowledge than any previous release. Features a massive 2 million token context window and the ability to call tools like Google Search and code execution.

Gemini 2.0 Pro

Next iteration of Gemini, released in three versions (Flash, Flash-Lite, Pro). Represents Google's state-of-the-art multimodal model, likely incorporating Mixture-of-Experts at unprecedented scale and targeting dominance in both text and image understanding.

Gemini 2.0 Flash

Google's most cost-efficient and fastest 2.5 model yet. Higher quality than 2.0 Flash-Lite on coding, math, science, reasoning and multimodal benchmarks. Excels at high-volume, latency-sensitive tasks like translation and classification.

Gemini 2.5 Flash-Lite

Best for cost-efficient performance. Better quality than 1.5 Flash, at the same speed and cost. Features a 1 million token context window and multimodal input.

Rank	Model	Provider	Score	Parameters	Released	Type
1	Grok 4	xAI	79.4	Unknown	2025-07-09	Multimodal
2	Gemini 2.5 Pro	Google	75.6		2025-05-06	Multimodal
3	Qwen-3	Alibaba	70.7	235B (22B active)	2025-04-29	Text
4	Gemini 2.5 Flash	Google	63.9		2025-05-20	Multimodal
5	Grok 3	xAI	57	Unknown (multi-trillion estimated)	2025-02-19	Multimodal
6	Grok 3 Mini	xAI	41.5	Unknown	2025-02-19	Multimodal
7	Gemini 2.0 Pro	Google	36		2025-02-05	Multimodal
8	Gemini 2.0 Flash	Google	34.5		2025-02-25	Multimodal
9	Gemini 2.5 Flash-Lite	Google	34.3		2025-06-17	Multimodal
10	Gemini 2.0 Flash-Lite	Google	28.9		2025-02-25	Multimodal

LiveCodeBench-v5

LiveCodeBench-v5 Leaderboard

About LiveCodeBench-v5

Methodology

Publication

Related Benchmarks

Aider Polyglot

Berkeley Function-Calling Leaderboard

CodeForces