GPT-1
Generative Pre-trained Transformer 1 (GPT-1) was the first of OpenAI's large language models, demonstrating unsupervised pre-training with a 12-layer decoder-only Transformer. It introduced a two-stage training process leveraging the BookCorpus dataset and achieved improvements across various NLP tasks.
2018-06-01
117M
12-layer decoder-only Transformer with 12 masked self-attention heads
MIT
Specifications
- Parameters
- 117M
- Architecture
- 12-layer decoder-only Transformer with 12 masked self-attention heads
- License
- MIT
- Context Window
- 512 tokens
- Training Data Cutoff
- 2018
- Type
- text
- Modalities
- text
Benchmark Scores
Advanced Specifications
- Model Family
- GPT
- API Access
- Not Available
- Chat Interface
- Not Available
- Multilingual Support
- No
Capabilities & Limitations
- Capabilities
- natural language inferencequestion answeringsemantic similaritytext classification
- Known Limitations
- limited context window (512 tokens)
- Notable Use Cases
- text classificationquestion answeringsemantic similarity assessment