GPT-2
Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. It was pre-trained on a dataset of 8 million web pages (WebText) and was a direct scale-up of GPT-1 with a ten-fold increase in both parameter count and training dataset size. Initially partially withheld due to concerns about potential misuse, the full model was released in November 2019.
2019-02-14
1.5B
Decoder-only Transformer
MIT
Specifications
- Parameters
- 1.5B
- Architecture
- Decoder-only Transformer
- License
- MIT
- Context Window
- 1,024 tokens
- Training Data Cutoff
- December 2017
- Type
- text
- Modalities
- text
Benchmark Scores
view all (+2)
Advanced Specifications
- Model Family
- GPT
- Finetuned From
- GPT-1
- API Access
- Not Available
- Chat Interface
- Not Available
- Multilingual Support
- No
- Variants
- 117M parameters (12 layers)345M parameters (24 layers)762M parameters (36 layers)1.5B parameters (48 layers)
Capabilities & Limitations
- Capabilities
- text generationtranslationsummarizationquestion answering
- Known Limitations
- becomes repetitive or nonsensical with long passageslacks coherence in longer textsresource-intensive deployment
- Notable Use Cases
- AI Dungeon text adventuresr/SubSimulatorGPT2 subredditcode autocompletioncounselor training simulations