
Reward Generalization in RLHF: A Topological Perspective
Papers citing "Reward Generalization in RLHF: A Topological Perspective"
14 / 14 papers shown
Title |
---|
![]() RewardBench: Evaluating Reward Models for Language Modeling Nathan Lambert Valentina Pyatkin Jacob Morrison Lester James V. Miranda Bill Yuchen Lin ...Sachin Kumar Tom Zick Yejin Choi Noah A. Smith Hanna Hajishirzi |