DeepSeek-V3

Name: DeepSeek-V3
Author: DeepSeek

DeepSeekOpen WeightsVerified

A powerful Mixture-of-Experts (MoE) language model with 671B total parameters and 37B activated for each token. Features Multi-head Latent Attention (MLA) and DeepSeekMoE architectures with innovative auxiliary-loss-free load balancing and multi-token prediction training. Pre-trained on 14.8T high-quality tokens with only 2.788M H800 GPU hours.

2024-12-26

671B total, 37B activated

Mixture of Experts (MoE)

MIT

Compare with other models

Specifications

Parameters: 671B total, 37B activated
Architecture: Mixture of Experts (MoE)
License: MIT
Context Window: 128,000 tokens
Training Data Cutoff: 2024-12
Type: text
Modalities: text

Benchmark Scores

MMLU88.5

Massive Multitask Language Understanding (MMLU) tests knowledge across 57 subjects including mathema...

MMLU-Pro75.9

MMLU-Pro is an enhanced benchmark with over 12,000 challenging questions across 14 domains including...

DROP91.6

Discrete Reasoning Over Paragraphs (DROP) requires models to resolve references in a passage and per...

HellaSwag88.9

Tests common sense natural language inference through completion of scenarios....

GSM8K89.3

Grade School Math 8K (GSM8K) consists of 8.5K high-quality grade school math word problems....

MATH90.2

A dataset of 12,500 challenging competition mathematics problems requiring multi-step reasoning....

HumanEval82.6

Evaluates code generation capabilities by asking models to complete Python functions based on docstr...

CodeForces51.6

Advanced competitive programming benchmark for evaluating large language models on algorithmic probl...

AIME-202439.2

American Invitational Mathematics Examination (AIME) 2024 problems....

AIME-202546.7

American Invitational Mathematics Examination (AIME) 2025 problems....

SimpleQA24.9

A benchmark of simple but precise questions to test factual knowledge and reasoning....

Aider Polyglot49.6

A comprehensive code editing benchmark based on Exercism coding exercises across 6 programming langu...

MGSM79.8

Multilingual Grade School Math (MGSM) extends GSM8K to 10 languages....

view all (+25)

Advanced Specifications

Model Family: DeepSeek
API Access: Available
Chat Interface: Available
Multilingual Support: Yes
Variants: BaseChatFP8BF16
Hardware Support: CUDAAMD GPUHuawei Ascend NPU

Capabilities & Limitations

Capabilities: reasoningcodemathmultilinguallong-contextmulti-token-prediction
Function Calling Support: Yes
Tool Use Support: Yes

Resources

Related Models

Kimi K2

Moonshot AI

Moonshot's latest Mixture-of-Experts model with 32 billion activated parameters and 1 trillion total parameters. Achieves state-of-the-art performance in frontier knowledge, math, and coding among non-thinking models. Meticulously optimized for agentic tasks with sophisticated tool-use capabilities and multi-turn interactions.

Typetext

Parameters1T total, 32B activated

2025-07-11

Open Weights

Details Compare

DeepSeek-R1

DeepSeek

DeepSeek-R1 is a first-generation reasoning model trained via large-scale reinforcement learning. Built on DeepSeek-V3-Base, it incorporates cold-start data before RL to address challenges like endless repetition and poor readability found in DeepSeek-R1-Zero. Achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks through advanced chain-of-thought reasoning capabilities.

Typetext

Parameters671B (37B activated)

2025-01-20

Open Weights

Details Compare

DeepSeek LLM

DeepSeek

DeepSeek LLM is an advanced language model comprising 67 billion parameters, trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. The model demonstrates superior general capabilities, outperforming Llama2 70B Base in reasoning, coding, math, and Chinese comprehension. Available in both Base and Chat variants with 7B and 67B parameter sizes.