OpenAI logo

o4-mini

OpenAIProprietaryVerified

A smaller model optimized for fast, cost-efficient reasoning. Achieves remarkable performance for its size and cost, particularly in math, coding, and visual tasks. It is the best-performing benchmarked model on AIME 2024 and 2025, with significantly higher usage limits than o3.

2025-04-16
Decoder-only Transformer with reinforcement learning
Proprietary

Specifications

Architecture
Decoder-only Transformer with reinforcement learning
License
Proprietary
Type
multimodal
Modalities
textvision

Benchmark Scores

AIME-202493.4

American Invitational Mathematics Examination (AIME) 2024 problems....

AIME-202592.7

American Invitational Mathematics Examination (AIME) 2025 problems....

GPQA81.4

Graduate-level Problems in Quantitative Analysis (GPQA) evaluates advanced reasoning on graduate-lev...

Humanitys-Last-Exam17.7

A challenging benchmark of novel problems designed to test the limits of AI capabilities....

SWE-bench68.1

Software Engineering Benchmark (SWE-bench) evaluates models on real-world software engineering tasks...

MMMU81.6

Massive Multi-discipline Multimodal Understanding (MMMU) evaluates multimodal understanding across 3...

MathVista84.3

Evaluates mathematical reasoning in visual contexts, combining vision and mathematical problem-solvi...

CharXiv-Reasoning72

Tests reasoning on challenging problems from arXiv papers across multiple scientific domains....

Codeforces2,719

Evaluates models on competitive programming problems from the Codeforces platform....

SWE-Lancer56,375

Evaluates models on real-world freelance software engineering tasks....

Aider-Polyglot68.9

Tests models on their ability to write code in multiple programming languages....

Scale-MultiChallenge42.99

A multi-domain challenge set created by Scale AI to test models across diverse tasks....

BrowseComp28.3

Evaluates models on their ability to browse and interact with web interfaces....

TAU-bench71.8

Tool Augmented Understanding Benchmark (TAU-bench) evaluates models on their ability to use tools....

See all benchmarks

Advanced Specifications

Model Family
o-series
API Access
Available
Chat Interface
Available
Multilingual Support
Yes

Capabilities & Limitations

Capabilities
codemathreasoningvisual perceptiontool useagentic
Known Limitations
May make errors in complex reasoning tasks
Function Calling Support
Yes
Tool Use Support
Yes

Related Models