
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Wei Xiong
Tong Zhang
Papers citing "RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment"
50 / 119 papers shown
Title |
---|
![]() RRM: Robust Reward Model Training Mitigates Reward Hacking Tianqi Liu Wei Xiong Jie Jessie Ren Lichang Chen Junru Wu ...Yuan Liu Bilal Piot Abe Ittycheriah Aviral Kumar Mohammad Saleh |