DeepSeek logo

DeepSeek-V3

DeepSeekOpen SourceVerified

A powerful Mixture-of-Experts (MoE) language model with 671B total parameters and 37B activated for each token. Features Multi-head Latent Attention (MLA) and DeepSeekMoE architectures with innovative auxiliary-loss-free load balancing and multi-token prediction training. Pre-trained on 14.8T high-quality tokens with only 2.788M H800 GPU hours.

2024-12-26
671B total, 37B activated
Mixture of Experts (MoE)
MIT

Specifications

Parameters
671B total, 37B activated
Architecture
Mixture of Experts (MoE)
License
MIT
Context Window
128,000 tokens
Training Data Cutoff
2024-12
Type
text
Modalities
text

Benchmark Scores

MMLU88.5

Massive Multitask Language Understanding (MMLU) tests knowledge across 57 subjects including mathema...

mmlu-redux89.1
MMLU-Pro75.9

MMLU-Pro is an enhanced benchmark with over 12,000 challenging questions across 14 domains including...

DROP91.6

Discrete Reasoning Over Paragraphs (DROP) requires models to resolve references in a passage and per...

bbh87.5
HellaSwag88.9

Tests common sense natural language inference through completion of scenarios....

arc-challenge95.3
arc-easy98.9
piqa84.7
winogrande84.9
GSM8K89.3

Grade School Math 8K (GSM8K) consists of 8.5K high-quality grade school math word problems....

MATH90.2

A dataset of 12,500 challenging competition mathematics problems requiring multi-step reasoning....

HumanEval82.6

Evaluates code generation capabilities by asking models to complete Python functions based on docstr...

mbpp75.4
livecodebench40.5
Codeforces51.6

Evaluates models on competitive programming problems from the Codeforces platform....

AIME-202439.2

American Invitational Mathematics Examination (AIME) 2024 problems....

gpqa-diamond59.1
SimpleQA24.9

A benchmark of simple but precise questions to test factual knowledge and reasoning....

longbench-v248.7
swe-verified42
aider-edit79.7
Aider-Polyglot49.6

Tests models on their ability to write code in multiple programming languages....

cnmo-202443.2
c-eval86.5
cmmlu88.8
MGSM79.8

Multilingual Grade School Math (MGSM) extends GSM8K to 10 languages....

cmath90.7
mmmlu-non-english79.4
c-simpleqa64.8
arena-hard85.5
alpacaeval-270
See all benchmarks

Advanced Specifications

Model Family
DeepSeek
API Access
Available
Chat Interface
Available
Multilingual Support
Yes
Variants
BaseChatFP8BF16
Hardware Support
CUDAAMD GPUHuawei Ascend NPU

Capabilities & Limitations

Capabilities
reasoningcodemathmultilinguallong-contextmulti-token-prediction
Function Calling Support
Yes
Tool Use Support
Yes

Related Models