BIRD-SQL
databaseVerified
BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) is a cross-domain dataset that examines the impact of extensive database contents on text-to-SQL parsing. It contains over 12,751 unique question-SQL pairs, 95 big databases with a total size of 33.4 GB, and covers more than 37 professional domains including blockchain, hockey, healthcare, and education.
Published: 2023
Score Range: 0-100
Top Score: 59.3
BIRD-SQL Leaderboard
Rank | Model | Provider | Score | Parameters | Released | Type |
---|---|---|---|---|---|---|
1 | Gemini 2.0 Pro | 59.3 | 2025-02-05 | Multimodal | ||
2 | Gemini 2.0 Flash | 58.7 | 2025-02-25 | Multimodal | ||
3 | Gemini 2.0 Flash-Lite | 57.4 | 2025-02-25 | Multimodal |
About BIRD-SQL
Methodology
BIRD-SQL evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.
Publication
This benchmark was published in 2023.Technical Paper