
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Papers citing "SLiC-HF: Sequence Likelihood Calibration with Human Feedback"
44 / 44 papers shown
Title |
---|
![]() RRM: Robust Reward Model Training Mitigates Reward Hacking Tianqi Liu Wei Xiong Jie Jessie Ren Lichang Chen Junru Wu ...Yuan Liu Bilal Piot Abe Ittycheriah Aviral Kumar Mohammad Saleh |