Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.19188
Cited By
Averaging log-likelihoods in direct alignment
27 June 2024
Nathan Grinsztajn
Yannis Flet-Berliac
M. G. Azar
Florian Strub
Bill Wu
Eugene Choi
Chris Cremer
Arash Ahmadian
Yash Chandak
Olivier Pietquin
Matthieu Geist
MoMe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Averaging log-likelihoods in direct alignment"
4 / 4 papers shown
Title
ShiQ: Bringing back Bellman to LLMs
Pierre Clavier
Nathan Grinsztajn
Raphaël Avalos
Yannis Flet-Berliac
Irem Ergun
...
Eugene Tarassov
Olivier Pietquin
Pierre Harvey Richemond
Florian Strub
Matthieu Geist
OffRL
14
0
0
16 May 2025
RL-finetuning LLMs from on- and off-policy data with a single algorithm
Yunhao Tang
Taco Cohen
David W. Zhang
Michal Valko
Rémi Munos
OffRL
46
3
0
25 Mar 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
91
4
0
18 Mar 2025
Self-Improving Robust Preference Optimization
Eugene Choi
Arash Ahmadian
Matthieu Geist
Oilvier Pietquin
M. G. Azar
33
8
0
03 Jun 2024
1