Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.12425
Cited By
On the Effectiveness of Offline RL for Dialogue Response Generation
23 July 2023
Paloma Sodhi
Felix Wu
Ethan R. Elenberg
Kilian Q. Weinberger
Ryan T. McDonald
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Effectiveness of Offline RL for Dialogue Response Generation"
15 / 15 papers shown
Title
lilGym: Natural Language Visual Reasoning with Reinforcement Learning
Anne Wu
Kianté Brantley
Noriyuki Kojima
Yoav Artzi
ReLM
OffRL
LRM
75
4
0
03 Nov 2022
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
72
244
0
03 Oct 2022
Quark: Controllable Text Generation with Reinforced Unlearning
Ximing Lu
Sean Welleck
Jack Hessel
Liwei Jiang
Lianhui Qin
Peter West
Prithviraj Ammanabrolu
Yejin Choi
MU
99
213
0
26 May 2022
Offline Reinforcement Learning with Implicit Q-Learning
Ilya Kostrikov
Ashvin Nair
Sergey Levine
OffRL
272
874
0
12 Oct 2021
Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation
Samuel Kiegeland
Julia Kreutzer
AAML
57
46
0
16 Jun 2021
Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems
Derek Chen
Howard Chen
Yi Yang
A. Lin
Zhou Yu
57
66
0
01 Apr 2021
Human-centric Dialog Training via Offline Reinforcement Learning
Natasha Jaques
J. Shen
Asma Ghandeharioun
Craig Ferguson
Àgata Lapedriza
Noah J. Jones
S. Gu
Rosalind W. Picard
OffRL
59
95
0
12 Oct 2020
MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines
Xiaoxue Zang
Abhinav Rastogi
Srinivas Sunkara
Raghav Gupta
Jianguo Zhang
Jindong Chen
65
276
0
10 Jul 2020
Conservative Q-Learning for Offline Reinforcement Learning
Aviral Kumar
Aurick Zhou
George Tucker
Sergey Levine
OffRL
OnRL
115
1,780
0
08 Jun 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
159
7,437
0
02 Oct 2019
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
Natasha Jaques
Asma Ghandeharioun
J. Shen
Craig Ferguson
Àgata Lapedriza
Noah J. Jones
S. Gu
Rosalind W. Picard
OffRL
88
338
0
30 Jun 2019
Building a Production Model for Retrieval-Based Chatbots
Kyle Swanson
L. Yu
C. Fox
Jeremy Wohlwend
Tao Lei
39
11
0
07 Jun 2019
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
256
18,685
0
20 Jul 2017
A Deep Reinforced Model for Abstractive Summarization
Romain Paulus
Caiming Xiong
R. Socher
AI4TS
156
1,551
0
11 May 2017
Sequence Level Training with Recurrent Neural Networks
MarcÁurelio Ranzato
S. Chopra
Michael Auli
Wojciech Zaremba
87
1,611
0
20 Nov 2015
1