Alibaba logo

Qwen2.5 72B

AlibabaOpen WeightsPending Human Review

The flagship dense model of the Qwen 2.5 series, pretrained on 18 trillion tokens. It features significant enhancements in knowledge, coding, and mathematics compared to Qwen 2. It supports structured data generation (JSON) and long-context understanding.

2024-09-19
72B
Decoder-only Transformer with Group Query Attention (GQA)
Qwen License

Specifications

Parameters
72B
Architecture
Decoder-only Transformer with Group Query Attention (GQA)
License
Qwen License
Context Window
128,000 tokens
Max Output
8,192 tokens
Training Data Cutoff
Sep 2024
Type
text
Modalities
text

Benchmark Scores

Advanced Specifications

Model Family
Qwen
API Access
Available
Chat Interface
Available
Multilingual Support
Yes
Variants
Qwen2.5-72B-InstructQwen2.5-72B (Base)
Hardware Support
CUDATPU

Capabilities & Limitations

Capabilities
reasoningcodingmathematicsmultilingualstructured outputrole-play
Known Limitations
Proprietary license for commercial use
Notable Use Cases
Enterprise applicationsComplex reasoningCreative writing
Function Calling Support
Yes
Tool Use Support
Yes

Related Models