Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.05848
Cited By
Human-centric Dialog Training via Offline Reinforcement Learning
12 October 2020
Natasha Jaques
J. Shen
Asma Ghandeharioun
Craig Ferguson
Àgata Lapedriza
Noah J. Jones
S. Gu
Rosalind W. Picard
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Human-centric Dialog Training via Offline Reinforcement Learning"
30 / 30 papers shown
Title
Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval
Kidist Amde Mekonnen
Yubao Tang
Maarten de Rijke
60
0
0
07 Apr 2025
Prompt Optimization with Logged Bandit Data
Haruka Kiyohara
Daniel Yiming Cao
Yuta Saito
Thorsten Joachims
64
0
0
03 Apr 2025
Problem Solving Through Human-AI Preference-Based Cooperation
Subhabrata Dutta
Timo Kaufmann
Goran Glavas
Ivan Habernal
Kristian Kersting
Frauke Kreuter
Mira Mezini
Iryna Gurevych
Eyke Hüllermeier
Hinrich Schuetze
98
1
0
14 Aug 2024
Prompt-Based Length Controlled Generation with Reinforcement Learning
Renlong Jie
Xiaojun Meng
Lifeng Shang
Xin Jiang
Qun Liu
17
8
0
23 Aug 2023
Preference-grounded Token-level Guidance for Language Model Fine-tuning
Shentao Yang
Shujian Zhang
Congying Xia
Yihao Feng
Caiming Xiong
Mi Zhou
29
23
0
01 Jun 2023
Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models
Qiang Zhang
Jason Naradowsky
Yusuke Miyao
ELM
26
32
0
29 May 2023
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism
Zihao Li
Zhuoran Yang
Mengdi Wang
OffRL
31
54
0
29 May 2023
Continually Improving Extractive QA via Human Feedback
Ge Gao
Hung-Ting Chen
Yoav Artzi
Eunsol Choi
26
12
0
21 May 2023
Deep RL with Hierarchical Action Exploration for Dialogue Generation
Itsugun Cho
Ryota Takahashi
Yusaku Yanase
Hiroaki Saito
25
2
0
22 Mar 2023
CausalDialogue: Modeling Utterance-level Causality in Conversations
Yi-Lin Tuan
Alon Albalak
Wenda Xu
Michael Stephen Saxon
Connor Pryor
Lise Getoor
William Yang Wang
CML
37
2
0
20 Dec 2022
Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows
D. Akimov
Vladislav Kurenkov
Alexander Nikulin
Denis Tarasov
Sergey Kolesnikov
OffRL
19
9
0
20 Nov 2022
Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs
Maarten Sap
Ronan Le Bras
Daniel Fried
Yejin Choi
27
209
0
24 Oct 2022
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
31
240
0
03 Oct 2022
Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability
Mengdi Xu
Zuxin Liu
Peide Huang
Wenhao Ding
Zhepeng Cen
Bo-wen Li
Ding Zhao
74
45
0
16 Sep 2022
Towards Boosting the Open-Domain Chatbot with Human Feedback
Hua Lu
Siqi Bao
H. He
Fan Wang
Hua Wu
Haifeng Wang
ALM
20
18
0
30 Aug 2022
Offline RL for Natural Language Generation with Implicit Language Q Learning
Charles Burton Snell
Ilya Kostrikov
Yi Su
Mengjiao Yang
Sergey Levine
OffRL
128
102
0
05 Jun 2022
Knowledge Infused Decoding
Ruibo Liu
Guoqing Zheng
Shashank Gupta
Radhika Gaonkar
Chongyang Gao
Soroush Vosoughi
Milad Shokouhi
Ahmed Hassan Awadallah
KELM
25
14
0
06 Apr 2022
Can Wikipedia Help Offline Reinforcement Learning?
Machel Reid
Yutaro Yamada
S. Gu
3DV
RALM
OffRL
140
95
0
28 Jan 2022
Pessimistic Model Selection for Offline Deep Reinforcement Learning
Chao-Han Huck Yang
Zhengling Qi
Yifan Cui
Pin-Yu Chen
OffRL
24
4
0
29 Nov 2021
Offline Reinforcement Learning with Soft Behavior Regularization
Haoran Xu
Xianyuan Zhan
Jianxiong Li
Honglei Yin
OffRL
26
31
0
14 Oct 2021
Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
Tianhe Yu
Aviral Kumar
Yevgen Chebotar
Karol Hausman
Sergey Levine
Chelsea Finn
OffRL
35
77
0
16 Sep 2021
Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior
Noriyuki Kojima
Alane Suhr
Yoav Artzi
25
24
0
10 Aug 2021
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey
Jinjie Ni
Tom Young
Vlad Pandelea
Fuzhao Xue
Min Zhang
54
268
0
10 May 2021
COMBO: Conservative Offline Model-Based Policy Optimization
Tianhe Yu
Aviral Kumar
Rafael Rafailov
Aravind Rajeswaran
Sergey Levine
Chelsea Finn
OffRL
219
415
0
16 Feb 2021
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
Seyed Kamyar Seyed Ghasemipour
Dale Schuurmans
S. Gu
OffRL
209
119
0
21 Jul 2020
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
298
1,610
0
18 Sep 2019
Dialogue Learning With Human-In-The-Loop
Jiwei Li
Alexander H. Miller
S. Chopra
MarcÁurelio Ranzato
Jason Weston
OffRL
227
134
0
29 Nov 2016
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li
Will Monroe
Alan Ritter
Michel Galley
Jianfeng Gao
Dan Jurafsky
214
1,327
0
05 Jun 2016
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Y. Gal
Zoubin Ghahramani
UQCV
BDL
285
9,145
0
06 Jun 2015
Off-Policy Actor-Critic
T. Degris
Martha White
R. Sutton
OffRL
CML
163
220
0
22 May 2012
1