
D2PO: Discriminator-Guided DPO with Response Evaluation Models
Papers citing "D2PO: Discriminator-Guided DPO with Response Evaluation Models"
15 / 15 papers shown
Title |
---|
![]() RewardBench: Evaluating Reward Models for Language Modeling Nathan Lambert Valentina Pyatkin Jacob Morrison Lester James V. Miranda Bill Yuchen Lin ...Sachin Kumar Tom Zick Yejin Choi Noah A. Smith Hanna Hajishirzi |