Qwen2.5-Omni 7B
An end-to-end multimodal model featuring a 'Thinker-Talker' architecture. It accepts text, image, audio, and video inputs and generates real-time streaming speech and text outputs without external ASR/TTS systems.
2025-03-27
7B
Thinker-Talker Architecture
Apache 2.0
Specifications
- Parameters
- 7B
- Architecture
- Thinker-Talker Architecture
- License
- Apache 2.0
- Context Window
- 32,768 tokens
- Max Output
- 4,096 tokens
- Training Data Cutoff
- Mar 2025
- Type
- multimodal
- Modalities
- textimageaudiovideo
Benchmark Scores
Advanced Specifications
- Model Family
- Qwen
- API Access
- Available
- Chat Interface
- Available
- Multilingual Support
- Yes
- Hardware Support
- CUDAMobile
Capabilities & Limitations
- Capabilities
- real-time speech interactionvideo understandingaudio analysisspeech generation
- Notable Use Cases
- Voice assistantsReal-time translationVideo chat
- Function Calling Support
- Yes
- Tool Use Support
- Yes