Title |
---|
![]() RRM: Robust Reward Model Training Mitigates Reward Hacking Tianqi Liu Wei Xiong Jie Jessie Ren Lichang Chen Junru Wu ...Yuan Liu Bilal Piot Abe Ittycheriah Aviral Kumar Mohammad Saleh |
![]() Towards a Unified View of Preference Learning for Large Language Models:
A Survey Bofei Gao Feifan Song Yibo Miao Zefan Cai Zhiyong Yang ...Houfeng Wang Zhifang Sui Peiyi Wang Baobao Chang Baobao Chang |
![]() Practical Aspects on Solving Differential Equations Using Deep Learning: A Primer Georgios Is. Detorakis |
![]() QPO: Query-dependent Prompt Optimization via Multi-Loop Offline
Reinforcement Learning Yilun Kong Hangyu Mao Qi Zhao Bin Zhang Jingqing Ruan Li Shen Yongzhe Chang Xueqian Wang Rui Zhao Dacheng Tao |