LiveBench
multi-domainVerified
A contamination-limited benchmark with frequently-updated questions from recent sources, scoring answers automatically against objective ground-truth values. Covers math, coding, reasoning, language, instruction following, and data analysis tasks.
Published: 2024
Scale: 0-100
Top Score: 77.1
LiveBench Leaderboard
Rank | Model | Provider | Score | Parameters | Released | Type |
---|---|---|---|---|---|---|
1 | Qwen-3 | Alibaba | 77.1 | 235B (22B active) | 2025-04-29 | Text |
About LiveBench
Description
A contamination-limited benchmark with frequently-updated questions from recent sources, scoring answers automatically against objective ground-truth values. Covers math, coding, reasoning, language, instruction following, and data analysis tasks.
Methodology
LiveBench evaluates models on a scale of 0 to 100. Higher scores indicate better performance. For detailed information about the methodology, please refer to the original paper.
Publication
This benchmark was published in 2024.Read the full paper