Alibaba logo

Qwen3-Omni 30B (MoE)

AlibabaOpen WeightsPending Human Review

A native end-to-end omni-modal model capable of processing text, image, audio, and video. It supports audio understanding of up to 40 minutes and features a 'Thinker-Talker' MoE architecture for low-latency streaming.

2025-09-22
30B (3B active)
Mixture of Experts (MoE) Thinker-Talker
Apache 2.0

Specifications

Parameters
30B (3B active)
Architecture
Mixture of Experts (MoE) Thinker-Talker
License
Apache 2.0
Context Window
128,000 tokens
Max Output
16,384 tokens
Training Data Cutoff
Sep 2025
Type
multimodal
Modalities
textimageaudiovideo

Benchmark Scores

Advanced Specifications

Model Family
Qwen
API Access
Available
Chat Interface
Available
Multilingual Support
Yes
Hardware Support
CUDA

Capabilities & Limitations

Capabilities
real-time speechaudio understandingvideo interactionstreaming synthesis
Notable Use Cases
Real-time translationInteractive assistantsAudio-visual analysis
Function Calling Support
Yes
Tool Use Support
Yes

Related Models