CharXiv-Reasoning
scientificPending Verification
Tests reasoning on challenging problems from arXiv papers across multiple scientific domains.
Published: 2023
Score Range: 0-100
Top Score: 78.6
CharXiv-Reasoning Leaderboard
About CharXiv-Reasoning
Methodology
CharXiv-Reasoning evaluates model performance using a standardized scoring methodology. Scores are reported on a scale of 0 to 100, where higher scores indicate better performance. For detailed information about the scoring system and methodology, please refer to the original paper.
Publication
This benchmark was published in 2023.Technical Paper