LiveBench

multi-domainVerified

A contamination-limited benchmark with frequently-updated questions from recent sources, scoring answers automatically against objective ground-truth values. Covers math, coding, reasoning, language, instruction following, and data analysis tasks.

Published: 2024
Scale: 0-100
Top Score: 77.1

LiveBench Leaderboard

RankModelProviderScoreParametersReleasedType
1Qwen-3Alibaba
77.1
235B (22B active)2025-04-29Text

About LiveBench

Description

A contamination-limited benchmark with frequently-updated questions from recent sources, scoring answers automatically against objective ground-truth values. Covers math, coding, reasoning, language, instruction following, and data analysis tasks.

Methodology

LiveBench evaluates models on a scale of 0 to 100. Higher scores indicate better performance. For detailed information about the methodology, please refer to the original paper.

Publication

This benchmark was published in 2024.Read the full paper