Title |
---|
![]() Nash Learning from Human Feedback Rémi Munos Michal Valko Daniele Calandriello M. G. Azar Mark Rowland ...Nikola Momchev Olivier Bachem D. Mankowitz Doina Precup Bilal Piot |
![]() Sample Efficient Preference Alignment in LLMs via Active Exploration Viraj Mehta Vikramjeet Das Ojash Neopane Yijia Dai Ilija Bogunovic Ilija Bogunovic Willie Neiswanger Stefano Ermon Jeff Schneider Willie Neiswanger |