Humanitys-Last-Exam
diversePending Verification
A challenging benchmark of novel problems designed to test the limits of AI capabilities.
Published: 2023
Score Range: 0-100
Top Score: 38.6
Humanitys-Last-Exam Leaderboard
Rank | Model | Provider | Score | Parameters | Released | Type |
---|---|---|---|---|---|---|
1 | Grok 4 | xAI | 38.6 | Unknown | 2025-07-09 | Multimodal |
2 | Gemini 2.5 Pro | 17.8 | 2025-05-06 | Multimodal | ||
3 | o4-mini | OpenAI | 17.7 | 2025-04-16 | Multimodal | |
4 | Gemini 2.5 Flash | 11 | 2025-05-20 | Multimodal | ||
5 | Gemini 2.5 Flash-Lite | 6.9 | 2025-06-17 | Multimodal |
About Humanitys-Last-Exam
Methodology
Humanitys-Last-Exam evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.
Publication
This benchmark was published in 2023.Read the full paper