
v1v2 (latest)
Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation
Papers citing "Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation"
16 / 16 papers shown
Title |
---|
![]() LiveBench: A Challenging, Contamination-Limited LLM Benchmark Colin White Samuel Dooley Manley Roberts Arka Pal Ben Feuer ...Willie Neiswanger Micah Goldblum Tom Goldstein Willie Neiswanger Micah Goldblum |