Qwen3-VL 235B (MoE)
The multimodal flagship of the Qwen 3 series. It integrates 'Interleaved-MRoPE' for superior spatial-temporal modeling and 'DeepStack' for vision-language alignment. It supports native interleaved contexts of up to 256K tokens.
2025-10-21
235B (22B active)
Mixture of Experts (MoE) + Vision Transformer
Apache 2.0
Specifications
- Parameters
- 235B (22B active)
- Architecture
- Mixture of Experts (MoE) + Vision Transformer
- License
- Apache 2.0
- Context Window
- 256,000 tokens
- Max Output
- 16,384 tokens
- Training Data Cutoff
- Oct 2025
- Type
- multimodal
- Modalities
- textimagevideo
Benchmark Scores
Advanced Specifications
- Model Family
- Qwen
- API Access
- Available
- Chat Interface
- Available
- Multilingual Support
- Yes
- Variants
- Qwen3-VL-30B-A3B (MoE)Qwen3-VL-32B (Dense)Qwen3-VL-8B (Dense)Qwen3-VL-4B (Dense)Qwen3-VL-2B (Dense)
- Hardware Support
- CUDA
Capabilities & Limitations
- Capabilities
- visual reasoninglong-context video understandingvisual agent3D grounding
- Notable Use Cases
- RoboticsLong video analysisVisual coding
- Function Calling Support
- Yes
- Tool Use Support
- Yes