
DeepSeek LLM
DeepSeek LLM is an advanced language model comprising 67 billion parameters, trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. The model demonstrates superior general capabilities, outperforming Llama2 70B Base in reasoning, coding, math, and Chinese comprehension. Available in both Base and Chat variants with 7B and 67B parameter sizes.
Specifications
- Parameters
- 67B
- Architecture
- Decoder-only Transformer with Grouped-Query Attention (GQA)
- License
- DeepSeek Model License
- Context Window
- 4,096 tokens
- Training Data Cutoff
- 2023-10
- Type
- text
- Modalities
- text
Benchmark Scores
Tests common sense natural language inference through completion of scenarios....
Massive Multitask Language Understanding (MMLU) tests knowledge across 57 subjects including mathema...
Grade School Math 8K (GSM8K) consists of 8.5K high-quality grade school math word problems....
Evaluates code generation capabilities by asking models to complete Python functions based on docstr...
Advanced Specifications
- Model Family
- DeepSeek
- API Access
- Available
- Chat Interface
- Available
- Multilingual Support
- Yes
- Variants
- DeepSeek LLM 7B BaseDeepSeek LLM 7B ChatDeepSeek LLM 67B BaseDeepSeek LLM 67B ChatGGUFGPTQ
- Hardware Support
- CUDATPU
Capabilities & Limitations
- Capabilities
- codemathreasoningmultilingualChinese language mastery
- Known Limitations
- Over-reliance on training data biasesHallucination in factual responsesRepetition in generated textMay generate biased or discriminatory content
- Notable Use Cases
- coding assistantmathematical problem solvingChinese-English bilingual tasksreasoning and analysis