Title |
---|
![]() Transfer Q Star: Principled Decoding for LLM Alignment Souradip Chakraborty Soumya Suvra Ghosal Ming Yin Dinesh Manocha Mengdi Wang Amrit Singh Bedi Furong Huang |
![]() Offline Regularised Reinforcement Learning for Large Language Models
Alignment Pierre Harvey Richemond Yunhao Tang Daniel Guo Daniele Calandriello M. G. Azar ...Gil Shamir Rishabh Joshi Tianqi Liu Rémi Munos Bilal Piot |
![]() Multi-turn Reinforcement Learning from Preference Human Feedback Lior Shani Aviv Rosenberg Asaf B. Cassel Oran Lang Daniele Calandriello ...Bilal Piot Idan Szpektor Avinatan Hassidim Yossi Matias Rémi Munos |