Title |
---|
![]() RewardBench: Evaluating Reward Models for Language Modeling Nathan Lambert Valentina Pyatkin Jacob Morrison Lester James V. Miranda Bill Yuchen Lin ...Sachin Kumar Tom Zick Yejin Choi Noah A. Smith Hanna Hajishirzi |
![]() Nash Learning from Human Feedback Rémi Munos Michal Valko Daniele Calandriello M. G. Azar Mark Rowland ...Nikola Momchev Olivier Bachem D. Mankowitz Doina Precup Bilal Piot |
![]() HuatuoGPT, towards Taming Language Model to Be a Doctor Hongbo Zhang Junying Chen Feng Jiang Fei Yu Zhihong Chen ...Zhiyi Zhang Qingying Xiao Xiang Wan Benyou Wang Haizhou Li |