Claude Opus 4.6
Claude Opus 4.6 is Anthropic's most intelligent model, upgrading its predecessor's coding, reasoning, and agentic capabilities. It plans more carefully, sustains agentic tasks for longer, operates more reliably in larger codebases, and has superior code review and debugging skills. In a first for Opus-class models, it features a 1M token context window (beta) and 128K max output tokens. Opus 4.6 achieves state-of-the-art results on Terminal-Bench 2.0 (65.4%), leads on Humanity's Last Exam, BrowseComp, and GDPval-AA, and scores 76% on the 8-needle 1M variant of MRCR v2 (vs. 18.5% for Sonnet 4.5), representing a qualitative leap in long-context performance. It introduces adaptive thinking and four effort levels (low, medium, high, max) for fine-grained control over intelligence, speed, and cost.
Specifications
- Parameters
- Unreleased
- Architecture
- Decoder-only Transformer
- License
- Proprietary
- Context Window
- 200,000 tokens
- Max Output
- 128,000 tokens
- Training Data Cutoff
- Aug 2025
- Type
- multimodal
- Modalities
- textimage
Benchmark Scores
Evaluates models on their ability to use terminal commands to solve system administration tasks....
A benchmark for measuring browsing agents' ability to navigate the web and find hard-to-find, entang...
A challenging benchmark of novel problems designed to test the limits of AI capabilities....
Graduate-level Problems in Quantitative Analysis (GPQA) evaluates advanced reasoning on graduate-lev...
A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI with 11.5...
Massive Multitask Language Understanding (MMLU) tests knowledge across 57 subjects including mathema...
MRCR (Multi-Round Coreference Resolution) is part of the Michelangelo benchmark suite that evaluates...
Advanced Specifications
- Model Family
- Claude
- Finetuned From
- Claude 4
- API Access
- Available
- Chat Interface
- Available
- Multilingual Support
- Yes
- Variants
- claude-opus-4-6-20260205
Capabilities & Limitations
- Capabilities
- deep reasoningagentic codingcomplex codingresearchadaptive thinkingeffort controlcontext compactioncomputer usetool useparallel tool executionextended thinkingvisionmultilinguallong context
- Known Limitations
- higher latency than Sonnet and Haikumay overthink simpler tasks at default high effortknowledge cutoff limits real-time data without tools
- Notable Use Cases
- agentic codingdeep researchcomplex system architecturemulti-step agentic workflowsfinancial analysiscybersecuritylong-context document analysiscode review and debugging
- Function Calling Support
- Yes
- Tool Use Support
- Yes