o4-mini
A smaller model optimized for fast, cost-efficient reasoning. Achieves remarkable performance for its size and cost, particularly in math, coding, and visual tasks. It is the best-performing benchmarked model on AIME 2024 and 2025, with significantly higher usage limits than o3.
Specifications
- Architecture
- Decoder-only Transformer with reinforcement learning
- License
- Proprietary
- Type
- multimodal
- Modalities
- textvision
Benchmark Scores
Graduate-level Problems in Quantitative Analysis (GPQA) evaluates advanced reasoning on graduate-lev...
A challenging benchmark of novel problems designed to test the limits of AI capabilities....
Software Engineering Benchmark (SWE-bench) evaluates models on real-world software engineering tasks...
A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI with 11.5...
Evaluates mathematical reasoning in visual contexts, combining vision and mathematical problem-solvi...
Tests reasoning on challenging problems from arXiv papers across multiple scientific domains....
Advanced competitive programming benchmark for evaluating large language models on algorithmic probl...
A benchmark of over 1,400 freelance software engineering tasks from Upwork, valued at $1 million USD...
A multi-domain challenge set created by Scale AI to test models across diverse tasks....
A benchmark for measuring browsing agents' ability to navigate the web and find hard-to-find, entang...
Tool Augmented Understanding Benchmark (TAU-bench) evaluates models on their ability to use tools....
Advanced Specifications
- Model Family
- o-series
- API Access
- Available
- Chat Interface
- Available
- Multilingual Support
- Yes
Capabilities & Limitations
- Capabilities
- codemathreasoningvisual perceptiontool useagentic
- Known Limitations
- May make errors in complex reasoning tasks
- Function Calling Support
- Yes
- Tool Use Support
- Yes