ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.13623
  4. Cited By
Reinforcement Learning and Bandits for Speech and Language Processing:
  Tutorial, Review and Outlook

Reinforcement Learning and Bandits for Speech and Language Processing: Tutorial, Review and Outlook

24 October 2022
Baihan Lin
    OffRL
    AI4TS
ArXivPDFHTML

Papers citing "Reinforcement Learning and Bandits for Speech and Language Processing: Tutorial, Review and Outlook"

43 / 93 papers shown
Title
Introduction to Multi-Armed Bandits
Introduction to Multi-Armed Bandits
Aleksandrs Slivkins
237
999
0
15 Apr 2019
An Algorithmic Perspective on Imitation Learning
An Algorithmic Perspective on Imitation Learning
Takayuki Osa
Joni Pajarinen
Gerhard Neumann
J. Andrew Bagnell
Pieter Abbeel
Jan Peters
77
833
0
16 Nov 2018
Reinforcement Learning Based Speech Enhancement for Robust Speech
  Recognition
Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition
Yih-Liang Shen
Chao Huang
Syu-Siang Wang
Yu Tsao
H. Wang
T. Chi
36
27
0
10 Nov 2018
Source-Critical Reinforcement Learning for Transferring Spoken Language
  Understanding to a New Language
Source-Critical Reinforcement Learning for Transferring Spoken Language Understanding to a New Language
Richard He Bai
Yu Zhou
Jiajun Zhang
Liang Zhao
M. Hwang
Chengqing Zong
32
10
0
19 Aug 2018
Deep Contextual Multi-armed Bandits
Deep Contextual Multi-armed Bandits
Mark Collier
H. Llorens
22
33
0
25 Jul 2018
A Survey of Inverse Reinforcement Learning: Challenges, Methods and
  Progress
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress
Saurabh Arora
Prashant Doshi
OffRL
63
602
0
18 Jun 2018
Deep Reinforcement Learning For Sequence to Sequence Models
Deep Reinforcement Learning For Sequence to Sequence Models
Yaser Keneshloo
Tian Shi
Naren Ramakrishnan
Chandan K. Reddy
AIMat
3DV
OffRL
41
209
0
24 May 2018
Toward Diverse Text Generation with Inverse Reinforcement Learning
Toward Diverse Text Generation with Inverse Reinforcement Learning
Zhan Shi
Xinchi Chen
Xipeng Qiu
Xuanjing Huang
38
104
0
30 Apr 2018
Occam's razor is insufficient to infer the preferences of irrational
  agents
Occam's razor is insufficient to infer the preferences of irrational agents
Stuart Armstrong
Sören Mindermann
50
93
0
15 Dec 2017
Reinforcement Learning of Speech Recognition System Based on Policy
  Gradient and Hypothesis Selection
Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection
Taku Kato
T. Shinozaki
30
21
0
10 Nov 2017
Paraphrase Generation with Deep Reinforcement Learning
Paraphrase Generation with Deep Reinforcement Learning
Zichao Li
Xin Jiang
Lifeng Shang
Hang Li
OffRL
59
213
0
01 Nov 2017
Sequence-to-Sequence ASR Optimization via Reinforcement Learning
Sequence-to-Sequence ASR Optimization via Reinforcement Learning
Andros Tjandra
S. Sakti
Satoshi Nakamura
AI4TS
73
25
0
30 Oct 2017
Emergent Complexity via Multi-Agent Competition
Emergent Complexity via Multi-Agent Competition
Trapit Bansal
J. Pachocki
Szymon Sidor
Ilya Sutskever
Igor Mordatch
48
384
0
10 Oct 2017
Seq2SQL: Generating Structured Queries from Natural Language using
  Reinforcement Learning
Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
Victor Zhong
Caiming Xiong
R. Socher
RALM
75
1,184
0
31 Aug 2017
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
206
18,685
0
20 Jul 2017
Natural Language Does Not Emerge Ñaturally' in Multi-Agent Dialog
Natural Language Does Not Emerge Ñaturally' in Multi-Agent Dialog
Satwik Kottur
José M. F. Moura
Stefan Lee
Dhruv Batra
LLMAG
53
218
0
26 Jun 2017
Deep reinforcement learning from human preferences
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
91
3,197
0
12 Jun 2017
Bandit Models of Human Behavior: Reward Processing in Mental Disorders
Bandit Models of Human Behavior: Reward Processing in Mental Disorders
Djallel Bouneffouf
Irina Rish
Guillermo Cecchi
13
29
0
07 Jun 2017
Reinforcement Learning with External Knowledge and Two-Stage Q-functions
  for Predicting Popular Reddit Threads
Reinforcement Learning with External Knowledge and Two-Stage Q-functions for Predicting Popular Reddit Threads
Ji He
Mari Ostendorf
Xiaodong He
OffRL
LRM
24
10
0
20 Apr 2017
Learning Cooperative Visual Dialog Agents with Deep Reinforcement
  Learning
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
Abhishek Das
Satwik Kottur
J. M. F. Moura
Stefan Lee
Dhruv Batra
OffRL
96
425
0
20 Mar 2017
Emergence of Grounded Compositional Language in Multi-Agent Populations
Emergence of Grounded Compositional Language in Multi-Agent Populations
Igor Mordatch
Pieter Abbeel
LLMAG
99
701
0
15 Mar 2017
Deep Reinforcement Learning: An Overview
Deep Reinforcement Learning: An Overview
Yuxi Li
OffRL
VLM
136
1,517
0
25 Jan 2017
A Connection between Generative Adversarial Networks, Inverse
  Reinforcement Learning, and Energy-Based Models
A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models
Chelsea Finn
Paul Christiano
Pieter Abbeel
Sergey Levine
OffRL
AI4CE
GAN
44
353
0
11 Nov 2016
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
Lantao Yu
Weinan Zhang
Jun Wang
Yong Yu
GAN
42
2,397
0
18 Sep 2016
Generative Adversarial Imitation Learning
Generative Adversarial Imitation Learning
Jonathan Ho
Stefano Ermon
GAN
111
3,084
0
10 Jun 2016
Deep Reinforcement Learning for Dialogue Generation
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li
Will Monroe
Alan Ritter
Michel Galley
Jianfeng Gao
Dan Jurafsky
256
1,331
0
05 Jun 2016
Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization
Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization
Lisha Li
Kevin Jamieson
Giulia DeSalvo
Afshin Rostamizadeh
Ameet Talwalkar
162
2,307
0
21 Mar 2016
Guided Cost Learning: Deep Inverse Optimal Control via Policy
  Optimization
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
Chelsea Finn
Sergey Levine
Pieter Abbeel
87
946
0
01 Mar 2016
Regret Analysis of the Finite-Horizon Gittins Index Strategy for
  Multi-Armed Bandits
Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits
Tor Lattimore
41
47
0
18 Nov 2015
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
Nan Jiang
Lihong Li
OffRL
127
621
0
11 Nov 2015
Continuous control with deep reinforcement learning
Continuous control with deep reinforcement learning
Timothy Lillicrap
Jonathan J. Hunt
Alexander Pritzel
N. Heess
Tom Erez
Yuval Tassa
David Silver
Daan Wierstra
181
13,174
0
09 Sep 2015
Multi-armed Bandit Problem with Known Trend
Multi-armed Bandit Problem with Known Trend
Djallel Bouneffouf
Raphael Feraud
30
83
0
28 Aug 2015
A Survey on Contextual Multi-armed Bandits
A Survey on Contextual Multi-armed Bandits
Li Zhou
42
124
0
13 Aug 2015
Language Understanding for Text-based Games Using Deep Reinforcement
  Learning
Language Understanding for Text-based Games Using Deep Reinforcement Learning
Karthik Narasimhan
Tejas D. Kulkarni
Regina Barzilay
OffRL
63
361
0
30 Jun 2015
High-Dimensional Continuous Control Using Generalized Advantage
  Estimation
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
OffRL
38
3,368
0
08 Jun 2015
An Emphatic Approach to the Problem of Off-policy Temporal-Difference
  Learning
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
R. Sutton
A. R. Mahmood
Martha White
53
267
0
14 Mar 2015
Trust Region Policy Optimization
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
237
6,722
0
19 Feb 2015
Online Stochastic Optimization under Correlated Bandit Feedback
Online Stochastic Optimization under Correlated Bandit Feedback
M. G. Azar
A. Lazaric
Emma Brunskill
OffRL
46
54
0
04 Feb 2014
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning
Volodymyr Mnih
Koray Kavukcuoglu
David Silver
Alex Graves
Ioannis Antonoglou
Daan Wierstra
Martin Riedmiller
98
12,163
0
19 Dec 2013
Bandits with Knapsacks
Bandits with Knapsacks
Ashwinkumar Badanidiyuru
Robert D. Kleinberg
Aleksandrs Slivkins
62
429
0
11 May 2013
Thompson Sampling for Contextual Bandits with Linear Payoffs
Thompson Sampling for Contextual Bandits with Linear Payoffs
Shipra Agrawal
Navin Goyal
128
993
0
15 Sep 2012
A Reduction of Imitation Learning and Structured Prediction to No-Regret
  Online Learning
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
Stéphane Ross
Geoffrey J. Gordon
J. Andrew Bagnell
OffRL
155
3,196
0
02 Nov 2010
On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems
On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems
Aurélien Garivier
Eric Moulines
67
294
0
22 May 2008
Previous
12