Title |
---|
![]() Offline Regularised Reinforcement Learning for Large Language Models
Alignment Pierre Harvey Richemond Yunhao Tang Daniel Guo Daniele Calandriello M. G. Azar ...Gil Shamir Rishabh Joshi Tianqi Liu Rémi Munos Bilal Piot |
![]() CoLT5: Faster Long-Range Transformers with Conditional Computation Joshua Ainslie Tao Lei Michiel de Jong Santiago Ontañón Siddhartha Brahma ...Mandy Guo James Lee-Thorp Yi Tay Yun-hsuan Sung Sumit Sanghai |