Qwen3-Omni 30B (MoE)
A native end-to-end omni-modal model capable of processing text, image, audio, and video. It supports audio understanding of up to 40 minutes and features a 'Thinker-Talker' MoE architecture for low-latency streaming.
2025-09-22
30B (3B active)
Mixture of Experts (MoE) Thinker-Talker
Apache 2.0
Specifications
- Parameters
- 30B (3B active)
- Architecture
- Mixture of Experts (MoE) Thinker-Talker
- License
- Apache 2.0
- Context Window
- 128,000 tokens
- Max Output
- 16,384 tokens
- Training Data Cutoff
- Sep 2025
- Type
- multimodal
- Modalities
- textimageaudiovideo
Benchmark Scores
Advanced Specifications
- Model Family
- Qwen
- API Access
- Available
- Chat Interface
- Available
- Multilingual Support
- Yes
- Hardware Support
- CUDA
Capabilities & Limitations
- Capabilities
- real-time speechaudio understandingvideo interactionstreaming synthesis
- Notable Use Cases
- Real-time translationInteractive assistantsAudio-visual analysis
- Function Calling Support
- Yes
- Tool Use Support
- Yes