BIRD-SQL

databaseVerified

BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) is a cross-domain dataset that examines the impact of extensive database contents on text-to-SQL parsing. It contains over 12,751 unique question-SQL pairs, 95 big databases with a total size of 33.4 GB, and covers more than 37 professional domains including blockchain, hockey, healthcare, and education.

Published: 2023
Score Range: 0-100
Top Score: 59.3

BIRD-SQL Leaderboard

RankModelProviderScoreParametersReleasedType
1Gemini 2.0 ProGoogle
59.3
2025-02-05Multimodal
2Gemini 2.0 FlashGoogle
58.7
2025-02-25Multimodal
3Gemini 2.0 Flash-LiteGoogle
57.4
2025-02-25Multimodal

About BIRD-SQL

Methodology

BIRD-SQL evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.

Publication

This benchmark was published in 2023.Technical Paper