Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.11871
Cited By
Offline RL for Natural Language Generation with Implicit Language Q Learning
5 June 2022
Charles Burton Snell
Ilya Kostrikov
Yi Su
Mengjiao Yang
Sergey Levine
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Offline RL for Natural Language Generation with Implicit Language Q Learning"
30 / 30 papers shown
Title
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
Jake Grigsby
Yuke Zhu
Michael S Ryoo
Juan Carlos Niebles
OffRL
VLM
41
0
0
06 May 2025
Prompt Optimization with Logged Bandit Data
Haruka Kiyohara
Daniel Yiming Cao
Yuta Saito
Thorsten Joachims
64
0
0
03 Apr 2025
Yes, Q-learning Helps Offline In-Context RL
Denis Tarasov
Alexander Nikulin
Ilya Zisman
Albina Klepach
Andrei Polubarov
Nikita Lyubaykin
Alexander Derevyagin
Igor Kiselev
Vladislav Kurenkov
OffRL
OnRL
175
0
0
24 Feb 2025
A Critical Look At Tokenwise Reward-Guided Text Generation
Ahmad Rashid
Ruotian Wu
Julia Grosse
Agustinus Kristiadi
Pascal Poupart
OffRL
76
0
0
17 Feb 2025
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
Zishun Yu
Tengyu Xu
Di Jin
Karthik Abinav Sankararaman
Yun He
...
Eryk Helenowski
Chen Zhu
Sinong Wang
Hao Ma
Han Fang
LRM
54
4
0
29 Jan 2025
Foundation Models at Work: Fine-Tuning for Fairness in Algorithmic Hiring
Buse Sibel Korkmaz
Rahul Nair
Elizabeth M. Daly
Evangelos Anagnostopoulos
Christos Varytimidis
Antonio del Rio Chanona
40
0
0
13 Jan 2025
Time-Reversal Provides Unsupervised Feedback to LLMs
Yerram Varun
Rahul Madhavan
Sravanti Addepalli
A. Suggala
Karthikeyan Shanmugam
Prateek Jain
LRM
SyDa
64
0
0
03 Dec 2024
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Guanlin Liu
Kaixuan Ji
Ning Dai
Zheng Wu
Chen Dun
Q. Gu
Lin Yan
Quanquan Gu
Lin Yan
OffRL
LRM
48
9
0
11 Oct 2024
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao
Wenhao Zhan
Jonathan D. Chang
Gokul Swamy
Kianté Brantley
Jason D. Lee
Wen Sun
OffRL
58
3
0
06 Oct 2024
Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies
Sijin Chen
Omar Hagrass
Jason M. Klusowski
32
2
0
04 Oct 2024
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
Yekun Chai
Haoran Sun
Huang Fang
Shuohuan Wang
Yu Sun
Hua-Hong Wu
150
1
0
03 Oct 2024
Moral Alignment for LLM Agents
Elizaveta Tennant
Stephen Hailes
Mirco Musolesi
45
1
0
02 Oct 2024
Alignment of Diffusion Models: Fundamentals, Challenges, and Future
Buhua Liu
Shitong Shao
Bao Li
Lichen Bai
Zhiqiang Xu
Haoyi Xiong
James Kwok
Sumi Helal
Zeke Xie
45
12
0
11 Sep 2024
Towards Aligning Language Models with Textual Feedback
Sauc Abadal Lloret
S. Dhuliawala
K. Murugesan
Mrinmaya Sachan
VLM
46
1
0
24 Jul 2024
InstructPatentGPT: Training patent language models to follow instructions with human feedback
Jieh-Sheng Lee
ALM
46
6
0
25 May 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
31
25
0
29 Jan 2024
Workflow-Guided Response Generation for Task-Oriented Dialogue
Do June Min
Paloma Sodhi
Ramya Ramakrishnan
42
0
0
14 Nov 2023
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
Tianhao Wu
Banghua Zhu
Ruoyu Zhang
Zhaojin Wen
Kannan Ramchandran
Jiantao Jiao
44
54
0
30 Sep 2023
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
46
10
0
28 Aug 2023
SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling
Jesse Zhang
Karl Pertsch
Jiahui Zhang
Joseph J. Lim
LM&Ro
36
17
0
20 Jun 2023
Deep RL with Hierarchical Action Exploration for Dialogue Generation
Itsugun Cho
Ryota Takahashi
Yusaku Yanase
Hiroaki Saito
22
2
0
22 Mar 2023
A Survey on Transformers in Reinforcement Learning
Wenzhe Li
Hao Luo
Zichuan Lin
Chongjie Zhang
Zongqing Lu
Deheng Ye
OffRL
MU
AI4CE
37
55
0
08 Jan 2023
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
31
239
0
03 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
325
11,953
0
04 Mar 2022
Offline Reinforcement Learning with Implicit Q-Learning
Ilya Kostrikov
Ashvin Nair
Sergey Levine
OffRL
214
843
0
12 Oct 2021
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
Seyed Kamyar Seyed Ghasemipour
Dale Schuurmans
S. Gu
OffRL
209
119
0
21 Jul 2020
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Sergey Levine
Aviral Kumar
George Tucker
Justin Fu
OffRL
GP
340
1,960
0
04 May 2020
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
286
1,595
0
18 Sep 2019
A Review of Evaluation Techniques for Social Dialogue Systems
A. C. Curry
H. Hastie
Verena Rieser
118
13
0
13 Sep 2017
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,746
0
26 Sep 2016
1