Title |
---|
![]() Towards a Unified View of Preference Learning for Large Language Models:
A Survey Bofei Gao Feifan Song Yibo Miao Zefan Cai Zhiyong Yang ...Houfeng Wang Zhifang Sui Peiyi Wang Baobao Chang Baobao Chang |
![]() Secrets of RLHF in Large Language Models Part II: Reward Modeling Bing Wang Rui Zheng Luyao Chen Yan Liu Shihan Dou ...Qi Zhang Xipeng Qiu Xuanjing Huang Zuxuan Wu Yuanyuan Jiang |
![]() Nash Learning from Human Feedback Rémi Munos Michal Valko Daniele Calandriello M. G. Azar Mark Rowland ...Nikola Momchev Olivier Bachem D. Mankowitz Doina Precup Bilal Piot |
![]() Llama 2: Open Foundation and Fine-Tuned Chat Models Hugo Touvron Louis Martin Kevin R. Stone Peter Albert Amjad Almahairi ...Sharan Narang Aurelien Rodriguez Robert Stojnic Sergey Edunov Thomas Scialom |