Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.13623
Cited By
Reinforcement Learning and Bandits for Speech and Language Processing: Tutorial, Review and Outlook
24 October 2022
Baihan Lin
OffRL
AI4TS
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reinforcement Learning and Bandits for Speech and Language Processing: Tutorial, Review and Outlook"
43 / 93 papers shown
Title
Introduction to Multi-Armed Bandits
Aleksandrs Slivkins
237
999
0
15 Apr 2019
An Algorithmic Perspective on Imitation Learning
Takayuki Osa
Joni Pajarinen
Gerhard Neumann
J. Andrew Bagnell
Pieter Abbeel
Jan Peters
77
833
0
16 Nov 2018
Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition
Yih-Liang Shen
Chao Huang
Syu-Siang Wang
Yu Tsao
H. Wang
T. Chi
36
27
0
10 Nov 2018
Source-Critical Reinforcement Learning for Transferring Spoken Language Understanding to a New Language
Richard He Bai
Yu Zhou
Jiajun Zhang
Liang Zhao
M. Hwang
Chengqing Zong
32
10
0
19 Aug 2018
Deep Contextual Multi-armed Bandits
Mark Collier
H. Llorens
22
33
0
25 Jul 2018
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress
Saurabh Arora
Prashant Doshi
OffRL
63
602
0
18 Jun 2018
Deep Reinforcement Learning For Sequence to Sequence Models
Yaser Keneshloo
Tian Shi
Naren Ramakrishnan
Chandan K. Reddy
AIMat
3DV
OffRL
41
209
0
24 May 2018
Toward Diverse Text Generation with Inverse Reinforcement Learning
Zhan Shi
Xinchi Chen
Xipeng Qiu
Xuanjing Huang
38
104
0
30 Apr 2018
Occam's razor is insufficient to infer the preferences of irrational agents
Stuart Armstrong
Sören Mindermann
50
93
0
15 Dec 2017
Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection
Taku Kato
T. Shinozaki
30
21
0
10 Nov 2017
Paraphrase Generation with Deep Reinforcement Learning
Zichao Li
Xin Jiang
Lifeng Shang
Hang Li
OffRL
59
213
0
01 Nov 2017
Sequence-to-Sequence ASR Optimization via Reinforcement Learning
Andros Tjandra
S. Sakti
Satoshi Nakamura
AI4TS
73
25
0
30 Oct 2017
Emergent Complexity via Multi-Agent Competition
Trapit Bansal
J. Pachocki
Szymon Sidor
Ilya Sutskever
Igor Mordatch
48
384
0
10 Oct 2017
Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
Victor Zhong
Caiming Xiong
R. Socher
RALM
75
1,184
0
31 Aug 2017
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
206
18,685
0
20 Jul 2017
Natural Language Does Not Emerge Ñaturally' in Multi-Agent Dialog
Satwik Kottur
José M. F. Moura
Stefan Lee
Dhruv Batra
LLMAG
53
218
0
26 Jun 2017
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
91
3,197
0
12 Jun 2017
Bandit Models of Human Behavior: Reward Processing in Mental Disorders
Djallel Bouneffouf
Irina Rish
Guillermo Cecchi
13
29
0
07 Jun 2017
Reinforcement Learning with External Knowledge and Two-Stage Q-functions for Predicting Popular Reddit Threads
Ji He
Mari Ostendorf
Xiaodong He
OffRL
LRM
24
10
0
20 Apr 2017
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
Abhishek Das
Satwik Kottur
J. M. F. Moura
Stefan Lee
Dhruv Batra
OffRL
96
425
0
20 Mar 2017
Emergence of Grounded Compositional Language in Multi-Agent Populations
Igor Mordatch
Pieter Abbeel
LLMAG
99
701
0
15 Mar 2017
Deep Reinforcement Learning: An Overview
Yuxi Li
OffRL
VLM
136
1,517
0
25 Jan 2017
A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models
Chelsea Finn
Paul Christiano
Pieter Abbeel
Sergey Levine
OffRL
AI4CE
GAN
44
353
0
11 Nov 2016
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
Lantao Yu
Weinan Zhang
Jun Wang
Yong Yu
GAN
42
2,397
0
18 Sep 2016
Generative Adversarial Imitation Learning
Jonathan Ho
Stefano Ermon
GAN
111
3,084
0
10 Jun 2016
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li
Will Monroe
Alan Ritter
Michel Galley
Jianfeng Gao
Dan Jurafsky
256
1,331
0
05 Jun 2016
Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization
Lisha Li
Kevin Jamieson
Giulia DeSalvo
Afshin Rostamizadeh
Ameet Talwalkar
162
2,307
0
21 Mar 2016
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
Chelsea Finn
Sergey Levine
Pieter Abbeel
87
946
0
01 Mar 2016
Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits
Tor Lattimore
41
47
0
18 Nov 2015
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
Nan Jiang
Lihong Li
OffRL
127
621
0
11 Nov 2015
Continuous control with deep reinforcement learning
Timothy Lillicrap
Jonathan J. Hunt
Alexander Pritzel
N. Heess
Tom Erez
Yuval Tassa
David Silver
Daan Wierstra
181
13,174
0
09 Sep 2015
Multi-armed Bandit Problem with Known Trend
Djallel Bouneffouf
Raphael Feraud
30
83
0
28 Aug 2015
A Survey on Contextual Multi-armed Bandits
Li Zhou
42
124
0
13 Aug 2015
Language Understanding for Text-based Games Using Deep Reinforcement Learning
Karthik Narasimhan
Tejas D. Kulkarni
Regina Barzilay
OffRL
63
361
0
30 Jun 2015
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
OffRL
38
3,368
0
08 Jun 2015
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
R. Sutton
A. R. Mahmood
Martha White
53
267
0
14 Mar 2015
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
237
6,722
0
19 Feb 2015
Online Stochastic Optimization under Correlated Bandit Feedback
M. G. Azar
A. Lazaric
Emma Brunskill
OffRL
46
54
0
04 Feb 2014
Playing Atari with Deep Reinforcement Learning
Volodymyr Mnih
Koray Kavukcuoglu
David Silver
Alex Graves
Ioannis Antonoglou
Daan Wierstra
Martin Riedmiller
98
12,163
0
19 Dec 2013
Bandits with Knapsacks
Ashwinkumar Badanidiyuru
Robert D. Kleinberg
Aleksandrs Slivkins
62
429
0
11 May 2013
Thompson Sampling for Contextual Bandits with Linear Payoffs
Shipra Agrawal
Navin Goyal
128
993
0
15 Sep 2012
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
Stéphane Ross
Geoffrey J. Gordon
J. Andrew Bagnell
OffRL
155
3,196
0
02 Nov 2010
On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems
Aurélien Garivier
Eric Moulines
67
294
0
22 May 2008
Previous
1
2