
v1v2 (latest)
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Papers citing "JudgeLM: Fine-tuned Large Language Models are Scalable Judges"
50 / 110 papers shown
Title |
---|
![]() HelloBench: Evaluating Long Text Generation Capabilities of Large
Language Models Haoran Que Feiyu Duan Liqun He Yutao Mou Wangchunshu Zhou ...Ge Zhang Junran Peng Zhaoxiang Zhang Songyang Zhang Kai Chen |
![]() The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models Seungone Kim Juyoung Suk Ji Yong Cho Shayne Longpre Chaeeun Kim ...Sean Welleck Graham Neubig Moontae Lee Kyungjae Lee Minjoon Seo |
![]() RewardBench: Evaluating Reward Models for Language Modeling Nathan Lambert Valentina Pyatkin Jacob Morrison Lester James V. Miranda Bill Yuchen Lin ...Sachin Kumar Tom Zick Yejin Choi Noah A. Smith Hanna Hajishirzi |