o3
OpenAI's most powerful reasoning model that pushes the frontier across coding, math, science, visual perception, and more. Sets new state-of-the-art on benchmarks including Codeforces, SWE-bench, and MMMU. Features full tool access and can agentically use and combine every tool within ChatGPT.
Specifications
- Architecture
- Decoder-only Transformer with reinforcement learning
- License
- Proprietary
- Type
- multimodal
- Modalities
- textvision
Benchmark Scores
American Invitational Mathematics Examination (AIME) problems test advanced mathematical problem-sol...
Graduate-level Problems in Quantitative Analysis (GPQA) evaluates advanced reasoning on graduate-lev...
Software Engineering Benchmark (SWE-bench) evaluates models on real-world software engineering tasks...
Massive Multi-discipline Multimodal Understanding (MMMU) evaluates multimodal understanding across 3...
Evaluates mathematical reasoning in visual contexts, combining vision and mathematical problem-solvi...
Tests reasoning on challenging problems from arXiv papers across multiple scientific domains....
Evaluates models on competitive programming problems from the Codeforces platform....
Evaluates models on real-world freelance software engineering tasks....
Tests models on their ability to write code in multiple programming languages....
A multi-domain challenge set created by Scale AI to test models across diverse tasks....
Evaluates models on their ability to browse and interact with web interfaces....
Tool Augmented Understanding Benchmark (TAU-bench) evaluates models on their ability to use tools....
Advanced Specifications
- Model Family
- o-series
- API Access
- Available
- Chat Interface
- Available
- Multilingual Support
- Yes
Capabilities & Limitations
- Capabilities
- codemathreasoningvisual perceptiontool useagentic
- Known Limitations
- May make errors in complex reasoning tasks
- Function Calling Support
- Yes
- Tool Use Support
- Yes