Anthropic logo

Claude Opus 4

AnthropicProprietaryVerified

Claude Opus 4 is Anthropic's most powerful model and the best coding model in the world, leading on SWE-bench (72.5%) and Terminal-bench (43.2%). It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, with the ability to work continuously for several hours. Features include extended thinking with tool use, parallel tool execution, improved memory capabilities, and significantly reduced shortcut behaviors.

2025-05-22
Decoder-only Transformer
Proprietary

Specifications

Architecture
Decoder-only Transformer
License
Proprietary
Context Window
200,000 tokens
Max Output
32,000 tokens
Training Data Cutoff
Mar 2025
Type
multimodal
Modalities
textimage

Benchmark Scores

MMLU87.4

Massive Multitask Language Understanding (MMLU) tests knowledge across 57 subjects including mathema...

GPQA74.9

Graduate-level Problems in Quantitative Analysis (GPQA) evaluates advanced reasoning on graduate-lev...

Software Engineering Benchmark (SWE-bench) evaluates models on real-world software engineering tasks...

Evaluates models on their ability to use terminal commands to solve system administration tasks....

MMMU73.7

A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI with 11.5...

AIME33.9

American Invitational Mathematics Examination (AIME) problems test advanced mathematical problem-sol...

American Invitational Mathematics Examination (AIME) 2025 problems....

The FACTS Grounding Leaderboard evaluates LLMs' ability to generate factually accurate long-form res...

Testing long-term coherence in agents by simulating a vending machine business. Agents manage orderi...

view all (+6)

Advanced Specifications

Model Family
Claude
API Access
Available
Chat Interface
Available
Multilingual Support
Yes

Capabilities & Limitations

Capabilities
codingreasoningtool usememoryparallel tool executionextended thinking
Notable Use Cases
codingcomplex problem-solvingagent workflowslong-running tasks
Function Calling Support
Yes
Tool Use Support
Yes

Related Models