Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14718
Cited By
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models
24 May 2023
Ashutosh Baheti
Ximing Lu
Faeze Brahman
Ronan Le Bras
Maarten Sap
Mark O. Riedl
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models"
8 / 8 papers shown
Title
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Yuhao Du
Zehan Li
Pengyu Cheng
Zhihong Chen
Yuejiao Xie
Xiang Wan
Anningzhe Gao
38
1
0
20 Feb 2025
Self-playing Adversarial Language Game Enhances LLM Reasoning
Pengyu Cheng
Tianhao Hu
Han Xu
Zhisong Zhang
Yong Dai
Lei Han
Nan Du
Nan Du
Xiaolong Li
SyDa
LRM
ReLM
98
29
0
16 Apr 2024
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Baolin Peng
Linfeng Song
Ye Tian
Lifeng Jin
Haitao Mi
Dong Yu
37
17
0
18 Sep 2023
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
46
10
0
28 Aug 2023
Offline RL for Natural Language Generation with Implicit Language Q Learning
Charles Burton Snell
Ilya Kostrikov
Yi Su
Mengjiao Yang
Sergey Levine
OffRL
128
102
0
05 Jun 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
330
11,953
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
395
8,495
0
28 Jan 2022
Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task
Marcin Junczys-Dowmunt
Roman Grundkiewicz
Shubha Guha
Kenneth Heafield
33
192
0
16 Apr 2018
1