Alibaba logo

Qwen2.5-Omni 7B

AlibabaOpen WeightsPending Human Review

An end-to-end multimodal model featuring a 'Thinker-Talker' architecture. It accepts text, image, audio, and video inputs and generates real-time streaming speech and text outputs without external ASR/TTS systems.

2025-03-27
7B
Thinker-Talker Architecture
Apache 2.0

Specifications

Parameters
7B
Architecture
Thinker-Talker Architecture
License
Apache 2.0
Context Window
32,768 tokens
Max Output
4,096 tokens
Training Data Cutoff
Mar 2025
Type
multimodal
Modalities
textimageaudiovideo

Benchmark Scores

Advanced Specifications

Model Family
Qwen
API Access
Available
Chat Interface
Available
Multilingual Support
Yes
Hardware Support
CUDAMobile

Capabilities & Limitations

Capabilities
real-time speech interactionvideo understandingaudio analysisspeech generation
Notable Use Cases
Voice assistantsReal-time translationVideo chat
Function Calling Support
Yes
Tool Use Support
Yes

Related Models