Qwen2.5 72B
The flagship dense model of the Qwen 2.5 series, pretrained on 18 trillion tokens. It features significant enhancements in knowledge, coding, and mathematics compared to Qwen 2. It supports structured data generation (JSON) and long-context understanding.
2024-09-19
72B
Decoder-only Transformer with Group Query Attention (GQA)
Qwen License
Specifications
- Parameters
- 72B
- Architecture
- Decoder-only Transformer with Group Query Attention (GQA)
- License
- Qwen License
- Context Window
- 128,000 tokens
- Max Output
- 8,192 tokens
- Training Data Cutoff
- Sep 2024
- Type
- text
- Modalities
- text
Benchmark Scores
Advanced Specifications
- Model Family
- Qwen
- API Access
- Available
- Chat Interface
- Available
- Multilingual Support
- Yes
- Variants
- Qwen2.5-72B-InstructQwen2.5-72B (Base)
- Hardware Support
- CUDATPU
Capabilities & Limitations
- Capabilities
- reasoningcodingmathematicsmultilingualstructured outputrole-play
- Known Limitations
- Proprietary license for commercial use
- Notable Use Cases
- Enterprise applicationsComplex reasoningCreative writing
- Function Calling Support
- Yes
- Tool Use Support
- Yes