Title |
---|
![]() Offline Regularised Reinforcement Learning for Large Language Models
Alignment Pierre Harvey Richemond Yunhao Tang Daniel Guo Daniele Calandriello M. G. Azar ...Gil Shamir Rishabh Joshi Tianqi Liu Rémi Munos Bilal Piot |
![]() SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended
Exploration Giulia Vezzani Dhruva Tirumala Markus Wulfmeier Dushyant Rao A. Abdolmaleki ...Tim Hertweck Thomas Lampe Fereshteh Sadeghi N. Heess Martin Riedmiller |