LLM Benchmarks

Compare model performance across standardized benchmarks that test different capabilities.

Common LLM Benchmarks

RankModelProviderScoreParametersReleased
1o3OpenAI
81.3
2025-04-16
2Gemini 2.5 ProGoogle
76.5
2025-05-06
3o4-miniOpenAI
68.9
2025-04-16
4Gemini 2.5 FlashGoogle
61.9
2025-05-20
5Qwen-3Alibaba
61.8
235B (22B active)2025-04-29
RankModelProviderScoreParametersReleased
1Grok 3 MinixAI
95.8
Unknown2025-02-19
2o4-miniOpenAI
93.4
2025-04-16
3Qwen-3Alibaba
85.7
235B (22B active)2025-04-29
4o1OpenAI
83.3
2024-09-12
5Claude 3.7 SonnetAnthropic
80
2025-02-24
RankModelProviderScoreParametersReleased
1Grok 3xAI
93.3
Unknown (multi-trillion estimated)2025-02-19
2o4-miniOpenAI
92.7
2025-04-16
3Grok 4xAI
91.7
Unknown2025-07-09
4Grok 3 MinixAI
90.8
Unknown2025-02-19
5Gemini 2.5 ProGoogle
83
2025-05-06
RankModelProviderScoreParametersReleased
1o3OpenAI
91.6
2025-04-16
2o1-miniOpenAI
70
2024-09-12
3o1-previewOpenAI
44.6
2024-09-12
4Claude Opus 4Anthropic
33.9
2025-05-22
5Claude Sonnet 4Anthropic
33.1
2025-05-22
RankModelProviderScoreParametersReleased
1Claude 3 OpusAnthropic
96.4
2024-03-04
2Claude 3 SonnetAnthropic
93.2
2024-03-04
3Claude 3 HaikuAnthropic
89.2
2024-03-04
4Mixtral 8×22BMistral AI
70
141B (39B active)2024-04-17
RankModelProviderScoreParametersReleased
1Qwen-3Alibaba
70.8
235B (22B active)2025-04-29
RankModelProviderScoreParametersReleased
1Claude 3.5 SonnetAnthropic
93.1
2024-06-20
2Claude 3 OpusAnthropic
86.8
2024-03-04
3Claude 3 SonnetAnthropic
82.9
2024-03-04
4Claude 3 HaikuAnthropic
73.7
2024-03-04
5Gemini DiffusionGoogle
15
2025-05-20
RankModelProviderScoreParametersReleased
1Gemini 2.0 ProGoogle
59.3
2025-02-05
2Gemini 2.0 FlashGoogle
58.7
2025-02-25
3Gemini 2.0 Flash-LiteGoogle
57.4
2025-02-25
RankModelProviderScoreParametersReleased
1o3OpenAI
49.7
2025-04-16
2o4-miniOpenAI
28.3
2025-04-16
3o1OpenAI
9.9
2024-09-12
4GPT-4.5OpenAI
0.9
2025-02-27
5GPT-4oOpenAI
0.6
2024-05-13
RankModelProviderScoreParametersReleased
1o3OpenAI
78.6
2025-04-16
2o4-miniOpenAI
72
2025-04-16
3GPT-4.1OpenAI
56.7
2025-04-14
RankModelProviderScoreParametersReleased
1o4-miniOpenAI
2,719
2025-04-16
2o3OpenAI
2,706
2025-04-16
3Qwen-3Alibaba
2,056
235B (22B active)2025-04-29
4DeepSeek-R1DeepSeek
2,029
671B (37B activated)2025-01-20
5o1OpenAI
1,673
2024-09-12
RankModelProviderScoreParametersReleased
1Gemini 2.0 ProGoogle
40.6
2025-02-05
2Gemini 2.0 FlashGoogle
39
2025-02-25
3Gemini 2.0 Flash-LiteGoogle
38.4
2025-02-25
RankModelProviderScoreParametersReleased
1o1-previewOpenAI
43
2024-09-12
2o1-miniOpenAI
28.7
2024-09-12
RankModelProviderScoreParametersReleased
1DeepSeek-R1DeepSeek
92.2
671B (37B activated)2025-01-20
2DeepSeek-V3DeepSeek
91.6
671B total, 37B activated2024-12-26
3Claude 3.5 SonnetAnthropic
87.1
2024-06-20
4GPT-4oOpenAI
83.4
2024-05-13
5Claude 3.5 HaikuAnthropic
83.1
2024-10-22
RankModelProviderScoreParametersReleased
1Grok 3xAI
74.5
Unknown (multi-trillion estimated)2025-02-19
2Grok 3 MinixAI
74.3
Unknown2025-02-19
3Gemini 2.0 ProGoogle
71.9
2025-02-05
4Gemini 2.0 FlashGoogle
71.1
2025-02-25
5Gemini 2.0 Flash-LiteGoogle
67.2
2025-02-25
RankModelProviderScoreParametersReleased
1Gemini 2.5 ProGoogle
87.8
2025-05-06
2Gemini 2.5 FlashGoogle
85.8
2025-05-20
3Gemini 2.0 FlashGoogle
85.6
2025-02-25
4Gemini 2.5 Flash-LiteGoogle
83.8
2025-06-17
5Claude 3.5 SonnetAnthropic
83.3
2024-06-20
RankModelProviderScoreParametersReleased
1Gemini 2.0 ProGoogle
86.5
2025-02-05
2Gemini 2.5 Flash-LiteGoogle
84.5
2025-06-17
3Gemini 2.0 FlashGoogle
83.4
2025-02-25
4Gemini 2.0 Flash-LiteGoogle
78.2
2025-02-25
RankModelProviderScoreParametersReleased
1Gemini 2.5 ProGoogle
88.6
2025-05-06
2Gemma 3Google
75.4
1B, 4B, 12B, 27B2025-03-12
RankModelProviderScoreParametersReleased
1Grok 4xAI
87.5
Unknown2025-07-09
2Claude 3.7 SonnetAnthropic
84.8
2025-02-24
3Grok 3 MinixAI
84
Unknown2025-02-19
4o3OpenAI
83.3
2025-04-16
5Gemini 2.5 ProGoogle
83
2025-05-06
RankModelProviderScoreParametersReleased
1Claude 3.5 SonnetAnthropic
96.4
2024-06-20
2Kimi K2Moonshot AI
95
1T total, 32B activated2025-07-11
3Claude 3 OpusAnthropic
95
2024-03-04
4Claude 3 SonnetAnthropic
92.3
2024-03-04
5Qwen-2Alibaba
89.5
72B2024-06-11
RankModelProviderScoreParametersReleased
1Claude 3 OpusAnthropic
95.4
2024-03-04
2Claude 3 SonnetAnthropic
89
2024-03-04
3DeepSeek-V3DeepSeek
88.9
671B total, 37B activated2024-12-26
4Mixtral 8×22BMistral AI
88
141B (39B active)2024-04-17
5Claude 3 HaikuAnthropic
85.9
2024-03-04
RankModelProviderScoreParametersReleased
1Gemini 2.0 ProGoogle
65.2
2025-02-05
2Gemini 2.0 FlashGoogle
63.5
2025-02-25
3Gemini 2.0 Flash-LiteGoogle
55.3
2025-02-25
RankModelProviderScoreParametersReleased
1o1-miniOpenAI
92.4
2024-09-12
2o1-previewOpenAI
92.4
2024-09-12
3Claude 3.5 SonnetAnthropic
92
2024-06-20
4GPT-4oOpenAI
90.2
2024-05-13
5Gemini DiffusionGoogle
89.6
2025-05-20
RankModelProviderScoreParametersReleased
1Grok 4xAI
38.6
Unknown2025-07-09
2Gemini 2.5 ProGoogle
17.8
2025-05-06
3o4-miniOpenAI
17.7
2025-04-16
4Gemini 2.5 FlashGoogle
11
2025-05-20
5Gemini 2.5 Flash-LiteGoogle
6.9
2025-06-17
RankModelProviderScoreParametersReleased
1Qwen-3Alibaba
77.1
235B (22B active)2025-04-29
2Kimi K2Moonshot AI
76.4
1T total, 32B activated2025-07-11
RankModelProviderScoreParametersReleased
1Grok 4xAI
79.4
Unknown2025-07-09
2Gemini 2.5 ProGoogle
75.6
2025-05-06
3Qwen-3Alibaba
70.7
235B (22B active)2025-04-29
4Gemini 2.5 FlashGoogle
63.9
2025-05-20
5Grok 3xAI
57
Unknown (multi-trillion estimated)2025-02-19
RankModelProviderScoreParametersReleased
1Grok 3xAI
83.3
Unknown (multi-trillion estimated)2025-02-19
2Grok 3 MinixAI
83.1
Unknown2025-02-19
RankModelProviderScoreParametersReleased
1DeepSeek-R1DeepSeek
97.3
671B (37B activated)2025-01-20
2o1-miniOpenAI
90
2024-09-12
3o1-previewOpenAI
85.5
2024-09-12
RankModelProviderScoreParametersReleased
1Kimi K2Moonshot AI
97.4
1T total, 32B activated2025-07-11
2Claude 3.7 SonnetAnthropic
96.2
2025-02-24
3o1OpenAI
94.8
2024-09-12
4Gemini 2.0 ProGoogle
91.8
2025-02-05
5Gemini 2.0 FlashGoogle
90.9
2025-02-25
RankModelProviderScoreParametersReleased
1o3OpenAI
86.8
2025-04-16
2o4-miniOpenAI
84.3
2025-04-16
3o1OpenAI
73.9
2024-09-12
4GPT-4.1OpenAI
72.2
2025-04-14
5Grok-2xAI
69
Unknown2024-08-13
RankModelProviderScoreParametersReleased
1Claude 3.5 SonnetAnthropic
91.6
2024-06-20
2Claude 3 OpusAnthropic
90.7
2024-03-04
3GPT-4oOpenAI
90.5
2024-05-13
4Claude 3.5 HaikuAnthropic
85.6
2024-10-22
5Claude 3 SonnetAnthropic
83.5
2024-03-04
RankModelProviderScoreParametersReleased
1DeepSeek-R1DeepSeek
84
671B (37B activated)2025-01-20
2Kimi K2Moonshot AI
81.1
1T total, 32B activated2025-07-11
3Grok 3xAI
79.9
Unknown (multi-trillion estimated)2025-02-19
4Gemini 2.0 ProGoogle
79.1
2025-02-05
5Grok 3 MinixAI
78.9
Unknown2025-02-19
RankModelProviderScoreParametersReleased
1o1OpenAI
92.3
2024-09-12
2DeepSeek-R1DeepSeek
90.8
671B (37B activated)2025-01-20
3o1-previewOpenAI
90.8
2024-09-12
4GPT-4.1OpenAI
90.2
2025-04-14
5Kimi K2Moonshot AI
89.5
1T total, 32B activated2025-07-11
RankModelProviderScoreParametersReleased
1o3OpenAI
82.9
2025-04-16
2o4-miniOpenAI
81.6
2025-04-16
3Gemini 2.5 FlashGoogle
79.7
2025-05-20
4Gemini 2.5 ProGoogle
79.6
2025-05-06
5o1OpenAI
78.2
2024-09-12
RankModelProviderScoreParametersReleased
1Gemini 2.5 FlashGoogle
74
2025-05-20
2Gemini 2.5 Flash-LiteGoogle
30.6
2025-06-17
RankModelProviderScoreParametersReleased
1Gemini 2.5 ProGoogle
93
2025-05-06
2Gemini 2.0 ProGoogle
74.7
2025-02-05
3Gemini 2.0 FlashGoogle
70.5
2025-02-25
4Gemini 2.0 Flash-LiteGoogle
58
2025-02-25
5Gemini 2.5 FlashGoogle
32
2025-05-20
RankModelProviderScoreParametersReleased
1Qwen-3Alibaba
71.9
235B (22B active)2025-04-29
2GPT-4.1OpenAI
70.8
2025-04-14
RankModelProviderScoreParametersReleased
1o3OpenAI
56.51
2025-04-16
2o4-miniOpenAI
42.99
2025-04-16
RankModelProviderScoreParametersReleased
1Gemini 2.5 ProGoogle
50.8
2025-05-06
2Gemini 2.0 ProGoogle
44.3
2025-02-05
3Grok 3xAI
43.6
Unknown (multi-trillion estimated)2025-02-19
4Kimi K2Moonshot AI
31
1T total, 32B activated2025-07-11
5DeepSeek-R1DeepSeek
30.1
671B (37B activated)2025-01-20
RankModelProviderScoreParametersReleased
1Claude Sonnet 4Anthropic
72.7
2025-05-22
2Claude Opus 4Anthropic
72.5
2025-05-22
3Claude 3.7 SonnetAnthropic
70.3
2025-02-24
4o3OpenAI
69.1
2025-04-16
5o4-miniOpenAI
68.1
2025-04-16
RankModelProviderScoreParametersReleased
1GPT-4.5OpenAI
186,125
2025-02-27
2o3OpenAI
66,250
2025-04-16
3o4-miniOpenAI
56,375
2025-04-16
4GPT-4.1OpenAI
35.1
2025-04-14
RankModelProviderScoreParametersReleased
1Claude 3.7 SonnetAnthropic
81.2
2025-02-24
2o3OpenAI
73.9
2025-04-16
3o4-miniOpenAI
71.8
2025-04-16
4Claude 3.5 HaikuAnthropic
51
2024-10-22
RankModelProviderScoreParametersReleased
1Claude Opus 4Anthropic
43.2
2025-05-22
RankModelProviderScoreParametersReleased
1Gemma 3Google
68.7
1B, 4B, 12B, 27B2025-03-12
RankModelProviderScoreParametersReleased
1Grok 4xAI
4,694.15
Unknown2025-07-09
2Claude 3.5 SonnetAnthropic
2,217.93
2024-06-20
3Claude Opus 4Anthropic
2,077.41
2025-05-22
4o3OpenAI
1,843.11
2025-04-16
5Claude 3.7 SonnetAnthropic
1,567.9
2025-02-24
RankModelProviderScoreParametersReleased
1Gemini 2.5 ProGoogle
65.6
2025-05-06
2Gemini 2.5 FlashGoogle
65.4
2025-05-20
3Gemini 2.5 Flash-LiteGoogle
57.5
2025-06-17
RankModelProviderScoreParametersReleased
1Gemini 2.5 ProGoogle
84.8
2025-05-06
RankModelProviderScoreParametersReleased
1Gemma 3nGoogle
50.1
4B2025-05-20