Title |
---|
![]() Offline Regularised Reinforcement Learning for Large Language Models
Alignment Pierre Harvey Richemond Yunhao Tang Daniel Guo Daniele Calandriello M. G. Azar ...Gil Shamir Rishabh Joshi Tianqi Liu Rémi Munos Bilal Piot |
![]() RewardBench: Evaluating Reward Models for Language Modeling Nathan Lambert Valentina Pyatkin Jacob Morrison Lester James V. Miranda Bill Yuchen Lin ...Sachin Kumar Tom Zick Yejin Choi Noah A. Smith Hanna Hajishirzi |
![]() Dolma: an Open Corpus of Three Trillion Tokens for Language Model
Pretraining Research Luca Soldaini Rodney Michael Kinney Akshita Bhagia Dustin Schwenk David Atkinson ...Hanna Hajishirzi Iz Beltagy Dirk Groeneveld Jesse Dodge Kyle Lo |
![]() Mistral 7B Albert Q. Jiang Alexandre Sablayrolles A. Mensch Chris Bamford Devendra Singh Chaplot ...Teven Le Scao Thibaut Lavril Thomas Wang Timothée Lacroix William El Sayed |