o4-mini

Name: o4-mini
Author: OpenAI

OpenAIProprietaryVerified

A smaller model optimized for fast, cost-efficient reasoning. Achieves remarkable performance for its size and cost, particularly in math, coding, and visual tasks. It is the best-performing benchmarked model on AIME 2024 and 2025, with significantly higher usage limits than o3.

2025-04-16

Decoder-only Transformer with reinforcement learning

Proprietary

Compare with other models

Specifications

Architecture: Decoder-only Transformer with reinforcement learning
License: Proprietary
Type: multimodal
Modalities: textvision

Benchmark Scores

Aider Polyglot68.9

A comprehensive code editing benchmark based on Exercism coding exercises across 6 programming langu...

AIME-202493.4

American Invitational Mathematics Examination (AIME) 2024 problems....

AIME-202592.7

American Invitational Mathematics Examination (AIME) 2025 problems....

BrowseComp28.3

A benchmark for measuring browsing agents' ability to navigate the web and find hard-to-find, entang...

CharXiv-Reasoning72

Tests reasoning on challenging problems from arXiv papers across multiple scientific domains....

CodeForces2719

Advanced competitive programming benchmark for evaluating large language models on algorithmic probl...

GPQA81.4

Graduate-level Problems in Quantitative Analysis (GPQA) evaluates advanced reasoning on graduate-lev...

Humanitys-Last-Exam17.7

A challenging benchmark of novel problems designed to test the limits of AI capabilities....

MathVista84.3

Evaluates mathematical reasoning in visual contexts, combining vision and mathematical problem-solvi...

MMMU81.6

A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI with 11.5...

Scale-MultiChallenge42.99

A multi-domain challenge set created by Scale AI to test models across diverse tasks....

SWE-bench68.1

Software Engineering Benchmark (SWE-bench) evaluates models on real-world software engineering tasks...

SWE-Lancer56375

A benchmark of over 1,400 freelance software engineering tasks from Upwork, valued at $1 million USD...

TAU-bench71.8

Tool Augmented Understanding Benchmark (TAU-bench) evaluates models on their ability to use tools....

Advanced Specifications

Model Family: o-series
API Access: Available
Chat Interface: Available
Multilingual Support: Yes

Capabilities & Limitations

Capabilities: codemathreasoningvisual perceptiontool useagentic
Known Limitations: May make errors in complex reasoning tasks
Function Calling Support: Yes
Tool Use Support: Yes

Resources

openai.com/blog/introducing-openai-o3-and-o4-mini

Related Models

GPT-5.2

OpenAI

GPT-5.2 is OpenAI's flagship frontier model series designed for professional knowledge work, advanced coding, and agentic workflows. Released in December 2025 as a response to competitive pressures, it features a massive 400,000-token context window and a 128,000-token maximum output capacity. The model utilizes a Mixture-of-Experts (MoE) architecture to balance inference efficiency with deep reasoning capabilities. It is available in three variants—Instant, Thinking, and Pro—each optimized for different points on the latency-intelligence curve. GPT-5.2 demonstrates state-of-the-art performance in tool calling reliability (98.7%), coding (SWE-Bench Verified 80.0%), and long-context retrieval.

Typemultimodal

ParametersProprietary

2025-12-11

Proprietary

Details Compare

GPT-5.1

OpenAI

GPT-5.1 is a frontier-grade multimodal language model family released by OpenAI in November 2025. It introduces a unified system architecture featuring a 'Smart Router' that dynamically allocates compute resources between two primary modes: 'Instant' (optimized for low latency and conversational warmth) and 'Thinking' (optimized for deep, adaptive reasoning). The model utilizes a Sparse Mixture-of-Experts (MoE) architecture with a central language backbone and attachable modules, allowing it to process text, audio, image, and video inputs natively. Key capabilities include adaptive test-time compute, where the model adjusts its reasoning depth based on query complexity, and enhanced personalization options with distinct personality presets. It demonstrates significant improvements in instruction following, coding (via the Codex variants), and mathematical reasoning compared to its predecessor, GPT-5.

Typemultimodal

ParametersUndisclosed (Estimated ~1.7-2T)

2025-11-12

Proprietary

Details Compare

GPT-OSS-120B

OpenAI

GPT-OSS-120B is a state-of-the-art open-weight language model that delivers strong real-world performance at low cost. This 120 billion parameter mixture-of-experts model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks while running efficiently on a single 80 GB GPU. It was trained using reinforcement learning and techniques informed by OpenAI's most advanced internal models, including o3 and other frontier systems.

Typetext

Parameters117B total (5.1B active per token)

2025-08-05

Open Weights

Details Compare