Compare model performance across standardized benchmarks that test different capabilities.
AI2 Reasoning Challenge (ARC) tests reasoning through grade-school science questions.
Graduate-level Problems in Quantitative Analysis (GPQA) evaluates advanced reasoning on graduate-level physics and mathematics problems.
Tests common sense natural language inference through completion of scenarios.
Evaluates code generation capabilities by asking models to complete Python functions based on docstrings and function signatures.
A dataset of 12,500 challenging competition mathematics problems requiring multi-step reasoning.
Massive Multitask Language Understanding (MMLU) tests knowledge across 57 subjects including mathematics, history, law, and more.
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | OpenAI o3 | OpenAI | 81.3 | 2025-04-16 | |
2 | Gemini 2.5 Pro | 76.5 | 2025-05-06 | ||
3 | OpenAI o4-mini | OpenAI | 68.9 | 2025-04-16 | |
4 | Gemini 2.5 Flash | 61.9 | 2025-05-20 | ||
5 | Qwen-3 | Alibaba | 61.8 | 235B (22B active) | 2025-04-29 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | OpenAI o4-mini | OpenAI | 93.4 | 2025-04-16 | |
2 | Qwen-3 | Alibaba | 85.7 | 235B (22B active) | 2025-04-29 |
3 | Claude 3.7 Sonnet | Anthropic | 80 | 2025-02-24 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | OpenAI o4-mini | OpenAI | 92.7 | 2025-04-16 | |
2 | Gemini 2.5 Pro | 83 | 2025-05-06 | ||
3 | Qwen-3 | Alibaba | 81.5 | 235B (22B active) | 2025-04-29 |
4 | Gemini 2.5 Flash | 72 | 2025-05-20 | ||
5 | Gemini Diffusion | 23.3 | 2025-05-20 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | OpenAI o3 | OpenAI | 91.6 | 2025-04-16 | |
2 | Claude Opus 4 | Anthropic | 33.9 | 2025-05-22 | |
3 | Claude Sonnet 4 | Anthropic | 33.1 | 2025-05-22 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Claude 3 Opus | Anthropic | 96.4 | 2024-03-04 | |
2 | Claude 3 Sonnet | Anthropic | 93.2 | 2024-03-04 | |
3 | Claude 3 Haiku | Anthropic | 89.2 | 2024-03-04 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Qwen-3 | Alibaba | 70.8 | 235B (22B active) | 2025-04-29 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Claude 3.5 Sonnet | Anthropic | 93.1 | 2024-06-20 | |
2 | Claude 3 Opus | Anthropic | 86.8 | 2024-03-04 | |
3 | Claude 3 Sonnet | Anthropic | 82.9 | 2024-03-04 | |
4 | Claude 3 Haiku | Anthropic | 73.7 | 2024-03-04 | |
5 | Gemini Diffusion | 15 | 2025-05-20 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Gemini 2.0 Pro | 59.3 | 2025-02-05 | ||
2 | Gemini 2.0 Flash | 58.7 | 2025-02-25 | ||
3 | Gemini 2.0 Flash-Lite | 57.4 | 2025-02-25 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | OpenAI o3 | OpenAI | 49.7 | 2025-04-16 | |
2 | OpenAI o4-mini | OpenAI | 28.3 | 2025-04-16 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | OpenAI o3 | OpenAI | 78.6 | 2025-04-16 | |
2 | OpenAI o4-mini | OpenAI | 72 | 2025-04-16 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | OpenAI o4-mini | OpenAI | 2,719 | 2025-04-16 | |
2 | OpenAI o3 | OpenAI | 2,706 | 2025-04-16 | |
3 | Qwen-3 | Alibaba | 2,056 | 235B (22B active) | 2025-04-29 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Gemini 2.0 Pro | 40.6 | 2025-02-05 | ||
2 | Gemini 2.0 Flash | 39 | 2025-02-25 | ||
3 | Gemini 2.0 Flash-Lite | 38.4 | 2025-02-25 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Claude 3.5 Sonnet | Anthropic | 87.1 | 2024-06-20 | |
2 | Claude 3.5 Haiku | Anthropic | 83.1 | 2024-10-22 | |
3 | Claude 3 Opus | Anthropic | 83.1 | 2024-03-04 | |
4 | Claude 3 Sonnet | Anthropic | 78.9 | 2024-03-04 | |
5 | Claude 3 Haiku | Anthropic | 78.4 | 2024-03-04 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Gemini 2.0 Pro | 71.9 | 2025-02-05 | ||
2 | Gemini 2.0 Flash | 71.1 | 2025-02-25 | ||
3 | Gemini 2.0 Flash-Lite | 67.2 | 2025-02-25 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Gemini 2.5 Flash | 85.3 | 2025-05-20 | ||
2 | Gemini 2.0 Flash | 84.6 | 2025-02-25 | ||
3 | Gemini 2.0 Flash-Lite | 83.6 | 2025-02-25 | ||
4 | Gemini 2.0 Pro | 82.8 | 2025-02-05 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Gemini 2.0 Pro | 86.5 | 2025-02-05 | ||
2 | Gemini 2.0 Flash | 83.4 | 2025-02-25 | ||
3 | Gemini 2.0 Flash-Lite | 78.2 | 2025-02-25 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Gemini 2.5 Pro | 88.6 | 2025-05-06 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Claude 3.7 Sonnet | Anthropic | 84.8 | 2025-02-24 | |
2 | OpenAI o3 | OpenAI | 83.3 | 2025-04-16 | |
3 | Gemini 2.5 Pro | 83 | 2025-05-06 | ||
4 | Gemini 2.5 Flash | 82.8 | 2025-05-20 | ||
5 | OpenAI o4-mini | OpenAI | 81.4 | 2025-04-16 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Claude 3.5 Sonnet | Anthropic | 96.4 | 2024-06-20 | |
2 | Claude 3 Opus | Anthropic | 95 | 2024-03-04 | |
3 | Claude 3 Sonnet | Anthropic | 92.3 | 2024-03-04 | |
4 | Qwen-2 | Alibaba | 89.5 | 72B | 2024-06-11 |
5 | Claude 3 Haiku | Anthropic | 88.9 | 2024-03-04 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Claude 3 Opus | Anthropic | 95.4 | 2024-03-04 | |
2 | Claude 3 Sonnet | Anthropic | 89 | 2024-03-04 | |
3 | Claude 3 Haiku | Anthropic | 85.9 | 2024-03-04 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Gemini 2.0 Pro | 65.2 | 2025-02-05 | ||
2 | Gemini 2.0 Flash | 63.5 | 2025-02-25 | ||
3 | Gemini 2.0 Flash-Lite | 55.3 | 2025-02-25 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Claude 3.5 Sonnet | Anthropic | 92 | 2024-06-20 | |
2 | Gemini Diffusion | 89.6 | 2025-05-20 | ||
3 | Claude 3.5 Haiku | Anthropic | 88.1 | 2024-10-22 | |
4 | Claude 3 Opus | Anthropic | 84.9 | 2024-03-04 | |
5 | Claude 3 Haiku | Anthropic | 75.9 | 2024-03-04 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Gemini 2.5 Pro | 17.8 | 2025-05-06 | ||
2 | OpenAI o4-mini | OpenAI | 17.7 | 2025-04-16 | |
3 | Gemini 2.5 Flash | 11 | 2025-05-20 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Qwen-3 | Alibaba | 77.1 | 235B (22B active) | 2025-04-29 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Gemini 2.5 Pro | 75.6 | 2025-05-06 | ||
2 | Qwen-3 | Alibaba | 70.7 | 235B (22B active) | 2025-04-29 |
3 | Gemini 2.5 Flash | 63.9 | 2025-05-20 | ||
4 | Gemini 2.0 Pro | 36 | 2025-02-05 | ||
5 | Gemini 2.0 Flash | 34.5 | 2025-02-25 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Claude 3.7 Sonnet | Anthropic | 96.2 | 2025-02-24 | |
2 | Gemini 2.0 Pro | 91.8 | 2025-02-05 | ||
3 | Gemini 2.0 Flash | 90.9 | 2025-02-25 | ||
4 | Gemini 2.0 Flash-Lite | 86.8 | 2025-02-25 | ||
5 | Claude 3.5 Sonnet | Anthropic | 71.1 | 2024-06-20 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | OpenAI o3 | OpenAI | 86.8 | 2025-04-16 | |
2 | OpenAI o4-mini | OpenAI | 84.3 | 2025-04-16 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Claude 3.5 Sonnet | Anthropic | 91.6 | 2024-06-20 | |
2 | Claude 3 Opus | Anthropic | 90.7 | 2024-03-04 | |
3 | Claude 3.5 Haiku | Anthropic | 85.6 | 2024-10-22 | |
4 | Claude 3 Sonnet | Anthropic | 83.5 | 2024-03-04 | |
5 | Claude 3 Haiku | Anthropic | 75.1 | 2024-03-04 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Gemini 2.0 Pro | 79.1 | 2025-02-05 | ||
2 | Gemini 2.0 Flash | 77.6 | 2025-02-25 | ||
3 | Gemini 2.0 Flash-Lite | 71.6 | 2025-02-25 | ||
4 | Qwen-2 | Alibaba | 55.6 | 72B | 2024-06-11 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Claude 3.5 Sonnet | Anthropic | 88.7 | 2024-06-20 | |
2 | Gemini 2.5 Flash | 88.4 | 2025-05-20 | ||
3 | Claude Opus 4 | Anthropic | 87.4 | 2025-05-22 | |
4 | Claude 3 Opus | Anthropic | 86.8 | 2024-03-04 | |
5 | Claude 3.7 Sonnet | Anthropic | 86.1 | 2025-02-24 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | OpenAI o3 | OpenAI | 82.9 | 2025-04-16 | |
2 | OpenAI o4-mini | OpenAI | 81.6 | 2025-04-16 | |
3 | Gemini 2.5 Flash | 79.7 | 2025-05-20 | ||
4 | Gemini 2.5 Pro | 79.6 | 2025-05-06 | ||
5 | Claude 3.7 Sonnet | Anthropic | 75 | 2025-02-24 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Gemini 2.5 Flash | 74 | 2025-05-20 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Gemini 2.5 Pro | 93 | 2025-05-06 | ||
2 | Gemini 2.0 Pro | 74.7 | 2025-02-05 | ||
3 | Gemini 2.0 Flash | 70.5 | 2025-02-25 | ||
4 | Gemini 2.0 Flash-Lite | 58 | 2025-02-25 | ||
5 | Gemini 2.5 Flash | 32 | 2025-05-20 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Qwen-3 | Alibaba | 71.9 | 235B (22B active) | 2025-04-29 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | OpenAI o3 | OpenAI | 56.51 | 2025-04-16 | |
2 | OpenAI o4-mini | OpenAI | 42.99 | 2025-04-16 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Gemini 2.5 Pro | 50.8 | 2025-05-06 | ||
2 | Gemini 2.0 Pro | 44.3 | 2025-02-05 | ||
3 | Gemini 2.0 Flash | 29.9 | 2025-02-25 | ||
4 | Gemini 2.5 Flash | 26.9 | 2025-05-20 | ||
5 | Gemini 2.0 Flash-Lite | 21.7 | 2025-02-25 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Claude Sonnet 4 | Anthropic | 72.7 | 2025-05-22 | |
2 | Claude Opus 4 | Anthropic | 72.5 | 2025-05-22 | |
3 | Claude 3.7 Sonnet | Anthropic | 70.3 | 2025-02-24 | |
4 | OpenAI o3 | OpenAI | 69.1 | 2025-04-16 | |
5 | OpenAI o4-mini | OpenAI | 68.1 | 2025-04-16 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | OpenAI o3 | OpenAI | 66,250 | 2025-04-16 | |
2 | OpenAI o4-mini | OpenAI | 56,375 | 2025-04-16 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Claude 3.7 Sonnet | Anthropic | 81.2 | 2025-02-24 | |
2 | OpenAI o3 | OpenAI | 73.9 | 2025-04-16 | |
3 | OpenAI o4-mini | OpenAI | 71.8 | 2025-04-16 | |
4 | Claude 3.5 Haiku | Anthropic | 51 | 2024-10-22 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Claude Opus 4 | Anthropic | 43.2 | 2025-05-22 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Gemini 2.5 Pro | 65.6 | 2025-05-06 | ||
2 | Gemini 2.5 Flash | 65.4 | 2025-05-20 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Gemini 2.5 Pro | 84.8 | 2025-05-06 |
Rank | Model | Provider | Score | Parameters | Released |
---|---|---|---|---|---|
1 | Gemma 3n | 50.1 | 4B | 2025-05-20 |