DeepSeek logo

DeepSeek LLM

DeepSeekOpen SourceVerified

DeepSeek LLM is an advanced language model comprising 67 billion parameters, trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. The model demonstrates superior general capabilities, outperforming Llama2 70B Base in reasoning, coding, math, and Chinese comprehension. Available in both Base and Chat variants with 7B and 67B parameter sizes.

2023-11-01
67B
Decoder-only Transformer with Grouped-Query Attention (GQA)
DeepSeek Model License

Specifications

Parameters
67B
Architecture
Decoder-only Transformer with Grouped-Query Attention (GQA)
License
DeepSeek Model License
Context Window
4,096 tokens
Training Data Cutoff
2023-10
Type
text
Modalities
text

Benchmark Scores

HellaSwag84

Tests common sense natural language inference through completion of scenarios....

triviaqa78.9
MMLU71.3

Massive Multitask Language Understanding (MMLU) tests knowledge across 57 subjects including mathema...

GSM8K63.4

Grade School Math 8K (GSM8K) consists of 8.5K high-quality grade school math word problems....

HumanEval42.7

Evaluates code generation capabilities by asking models to complete Python functions based on docstr...

bbh68.7
c-eval66.1
cmmlu70.8
chinese-qa87.6
hungarian-math-exam65
leetcode-weekly42.7
gsm8k-chat84.1
humaneval-chat73.8
mmlu-chat71.1
triviaqa-chat81.5
bbh-chat71.7
See all benchmarks

Advanced Specifications

Model Family
DeepSeek
API Access
Available
Chat Interface
Available
Multilingual Support
Yes
Variants
DeepSeek LLM 7B BaseDeepSeek LLM 7B ChatDeepSeek LLM 67B BaseDeepSeek LLM 67B ChatGGUFGPTQ
Hardware Support
CUDATPU

Capabilities & Limitations

Capabilities
codemathreasoningmultilingualChinese language mastery
Known Limitations
Over-reliance on training data biasesHallucination in factual responsesRepetition in generated textMay generate biased or discriminatory content
Notable Use Cases
coding assistantmathematical problem solvingChinese-English bilingual tasksreasoning and analysis

Related Models