
On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization
Papers citing "On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization"
17 / 17 papers shown
Title |
---|
![]() Nash Learning from Human Feedback Rémi Munos Michal Valko Daniele Calandriello M. G. Azar Mark Rowland ...Nikola Momchev Olivier Bachem D. Mankowitz Doina Precup Bilal Piot |