Mixtral 8×7B
A sparse Mixture-of-Experts model (8 experts) with a total of 46.7B parameters (12.9B active per token). Outperformed dense models like Llama 2 70B and GPT-3.5 on many benchmarks at significantly lower compute cost.
2023-12-01
46.7B (8×7B)
Mixture-of-Experts Transformer
Apache 2.0
Specifications
- Parameters
- 46.7B (8×7B)
- Architecture
- Mixture-of-Experts Transformer
- License
- Apache 2.0
- Context Window
- 4,096 tokens
- Type
- text
- Modalities
- text
Benchmark Scores
Advanced Specifications
- Model Family
- Mixtral
- API Access
- Not Available
- Chat Interface
- Not Available