Qwen3 235B (MoE)
The flagship model of the Qwen 3 family, featuring a Mixture-of-Experts architecture with 235B total parameters (22B active). It introduces 'Hybrid Reasoning' capabilities, allowing it to switch between standard generation and a deep 'Thinking Mode' for complex logic. Trained on 36 trillion tokens, with support for 119 languages and dialects. Available in 6 dense models (0.6B to 32B parameters) and 2 MoE models (30B/3B active and 235B/22B active).
2025-04-29
235B (22B active)
Mixture of Experts (MoE)
Apache 2.0
Specifications
- Parameters
- 235B (22B active)
- Architecture
- Mixture of Experts (MoE)
- License
- Apache 2.0
- Context Window
- 128,000 tokens
- Max Output
- 16,384 tokens
- Training Data Cutoff
- Apr 2025
- Type
- text
- Modalities
- text
Benchmark Scores
Advanced Specifications
- Model Family
- Qwen
- API Access
- Available
- Chat Interface
- Available
- Multilingual Support
- Yes
- Variants
- Qwen3-30B-A3B (MoE)Qwen3-32B (Dense)Qwen3-14B (Dense)Qwen3-8B (Dense)Qwen3-4B (Dense)Qwen3-1.7B (Dense)Qwen3-0.6B (Dense)
- Hardware Support
- CUDATPU
Capabilities & Limitations
- Capabilities
- hybrid reasoningthinking modenon-thinking modemathematicscodinglogical reasoningfunction-callingtool usecreative writingmultilingual
- Notable Use Cases
- agent-based applicationscomplex reasoning tasksmultilingual applicationsAlibaba's Quark AI assistantComplex reasoningScientific researchAgentic workflows
- Function Calling Support
- Yes
- Tool Use Support
- Yes