Alibaba logo

Qwen3-VL 235B (MoE)

AlibabaOpen WeightsPending Human Review

The multimodal flagship of the Qwen 3 series. It integrates 'Interleaved-MRoPE' for superior spatial-temporal modeling and 'DeepStack' for vision-language alignment. It supports native interleaved contexts of up to 256K tokens.

2025-10-21
235B (22B active)
Mixture of Experts (MoE) + Vision Transformer
Apache 2.0

Specifications

Parameters
235B (22B active)
Architecture
Mixture of Experts (MoE) + Vision Transformer
License
Apache 2.0
Context Window
256,000 tokens
Max Output
16,384 tokens
Training Data Cutoff
Oct 2025
Type
multimodal
Modalities
textimagevideo

Benchmark Scores

Advanced Specifications

Model Family
Qwen
API Access
Available
Chat Interface
Available
Multilingual Support
Yes
Variants
Qwen3-VL-30B-A3B (MoE)Qwen3-VL-32B (Dense)Qwen3-VL-8B (Dense)Qwen3-VL-4B (Dense)Qwen3-VL-2B (Dense)
Hardware Support
CUDA

Capabilities & Limitations

Capabilities
visual reasoninglong-context video understandingvisual agent3D grounding
Notable Use Cases
RoboticsLong video analysisVisual coding
Function Calling Support
Yes
Tool Use Support
Yes

Related Models