Multi-IF
instructionVerified
Multi-IF evaluates LLMs on multi-turn and multilingual instruction following across 8 languages, with 4,501 conversations of three turns each. It reveals performance degradation with each additional turn and challenges with non-Latin scripts.
Published: 2024
Score Range: 0-100
Top Score: 71.9
Multi-IF Leaderboard
About Multi-IF
Methodology
Multi-IF evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.
Publication
This benchmark was published in 2024.Technical Paper