
DeepSeek-V3
A powerful Mixture-of-Experts (MoE) language model with 671B total parameters and 37B activated for each token. Features Multi-head Latent Attention (MLA) and DeepSeekMoE architectures with innovative auxiliary-loss-free load balancing and multi-token prediction training. Pre-trained on 14.8T high-quality tokens with only 2.788M H800 GPU hours.
Specifications
- Parameters
- 671B total, 37B activated
- Architecture
- Mixture of Experts (MoE)
- License
- MIT
- Context Window
- 128,000 tokens
- Training Data Cutoff
- 2024-12
- Type
- text
- Modalities
- text
Benchmark Scores
Massive Multitask Language Understanding (MMLU) tests knowledge across 57 subjects including mathema...
MMLU-Pro is an enhanced benchmark with over 12,000 challenging questions across 14 domains including...
Discrete Reasoning Over Paragraphs (DROP) requires models to resolve references in a passage and per...
Tests common sense natural language inference through completion of scenarios....
Grade School Math 8K (GSM8K) consists of 8.5K high-quality grade school math word problems....
A dataset of 12,500 challenging competition mathematics problems requiring multi-step reasoning....
Evaluates code generation capabilities by asking models to complete Python functions based on docstr...
Evaluates models on competitive programming problems from the Codeforces platform....
American Invitational Mathematics Examination (AIME) 2024 problems....
A benchmark of simple but precise questions to test factual knowledge and reasoning....
Tests models on their ability to write code in multiple programming languages....
Multilingual Grade School Math (MGSM) extends GSM8K to 10 languages....
Advanced Specifications
- Model Family
- DeepSeek
- API Access
- Available
- Chat Interface
- Available
- Multilingual Support
- Yes
- Variants
- BaseChatFP8BF16
- Hardware Support
- CUDAAMD GPUHuawei Ascend NPU
Capabilities & Limitations
- Capabilities
- reasoningcodemathmultilinguallong-contextmulti-token-prediction
- Function Calling Support
- Yes
- Tool Use Support
- Yes