Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.00261
Cited By
v1
v2
v3
v4
v5 (latest)
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
1 August 2019
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift"
50 / 222 papers shown
Title
Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL
Andrea Zanette
OffRL
250
71
0
14 Dec 2020
Sample Complexity of Policy Gradient Finding Second-Order Stationary Points
Long Yang
Qian Zheng
Gang Pan
109
21
0
02 Dec 2020
CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee
Tengyu Xu
Yingbin Liang
Guanghui Lan
123
128
0
11 Nov 2020
A Study of Policy Gradient on a Class of Exactly Solvable Models
Gavin McCracken
Colin Daniels
Rosie Zhao
Anna M. Brandenberger
Prakash Panangaden
Doina Precup
49
0
0
03 Nov 2020
Finding the Near Optimal Policy via Adaptive Reduced Regularization in MDPs
Wenhao Yang
Xiang Li
Guangzeng Xie
Zhihua Zhang
104
5
0
31 Oct 2020
Conservative Safety Critics for Exploration
Homanga Bharadhwaj
Aviral Kumar
Nicholas Rhinehart
Sergey Levine
Florian Shkurti
Animesh Garg
OffRL
119
139
0
27 Oct 2020
Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime
Andrea Agazzi
Jianfeng Lu
91
16
0
22 Oct 2020
Optimising Stochastic Routing for Taxi Fleets with Model Enhanced Reinforcement Learning
Shen Ren
Qianxiao Li
Liye Zhang
Zheng Qin
Bo Yang
30
0
0
22 Oct 2020
Sample Efficient Reinforcement Learning with REINFORCE
Junzi Zhang
Jongho Kim
Brendan O'Donoghue
Stephen P. Boyd
137
113
0
22 Oct 2020
Logistic Q-Learning
Joan Bas-Serrano
Sebastian Curi
Andreas Krause
Gergely Neu
111
40
0
21 Oct 2020
Provable Fictitious Play for General Mean-Field Games
Qiaomin Xie
Zhuoran Yang
Zhaoran Wang
Andreea Minca
86
18
0
08 Oct 2020
A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms
Shangtong Zhang
Romain Laroche
H. V. Seijen
Shimon Whiteson
Rémi Tachet des Combes
135
15
0
02 Oct 2020
Revisiting Design Choices in Proximal Policy Optimization
Chloe Ching-Yun Hsu
Celestine Mendler-Dünner
Moritz Hardt
169
57
0
23 Sep 2020
Beyond variance reduction: Understanding the true impact of baselines on policy optimization
Wesley Chung
Valentin Thomas
Marlos C. Machado
Nicolas Le Roux
OffRL
115
24
0
31 Aug 2020
On the Sample Complexity of Reinforcement Learning with Policy Space Generalization
Wenlong Mou
Zheng Wen
Xi Chen
81
11
0
17 Aug 2020
Reinforcement Learning with Trajectory Feedback
Yonathan Efroni
Nadav Merlis
Shie Mannor
116
45
0
13 Aug 2020
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy
Zuyue Fu
Zhuoran Yang
Zhaoran Wang
91
43
0
02 Aug 2020
Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
Chen-Yu Wei
Mehdi Jafarnia-Jahromi
Haipeng Luo
Rahul Jain
94
43
0
23 Jul 2020
Approximation Benefits of Policy Gradient Methods with Aggregated States
Daniel Russo
125
7
0
22 Jul 2020
On Linear Convergence of Policy Gradient Methods for Finite MDPs
Jalaj Bhandari
Daniel Russo
134
61
0
21 Jul 2020
A Short Note on Soft-max and Policy Gradients in Bandits Problems
N. Walton
59
1
0
20 Jul 2020
Regret Analysis of a Markov Policy Gradient Algorithm for Multi-arm Bandits
D. Denisov
N. Walton
83
8
0
20 Jul 2020
Provably Good Batch Reinforcement Learning Without Great Exploration
Yao Liu
Adith Swaminathan
Alekh Agarwal
Emma Brunskill
OffRL
206
105
0
16 Jul 2020
Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity
Jianchao Tan
Sham Kakade
Tamer Bacsar
Lin F. Yang
151
123
0
15 Jul 2020
Variational Policy Gradient Method for Reinforcement Learning with General Utilities
Junyu Zhang
Alec Koppel
Amrit Singh Bedi
Csaba Szepesvári
Mengdi Wang
91
140
0
04 Jul 2020
Deep Bayesian Quadrature Policy Optimization
Akella Ravi Tej
Kamyar Azizzadenesheli
Mohammad Ghavamzadeh
Anima Anandkumar
Yisong Yue
77
5
0
28 Jun 2020
When Will Generative Adversarial Imitation Learning Algorithms Attain Global Convergence
Ziwei Guan
Tengyu Xu
Yingbin Liang
85
16
0
24 Jun 2020
Ecological Reinforcement Learning
John D. Co-Reyes
Suvansh Sanjeev
Glen Berseth
Abhishek Gupta
Sergey Levine
OffRL
107
23
0
22 Jun 2020
Information Theoretic Regret Bounds for Online Nonlinear Control
Sham Kakade
A. Krishnamurthy
Kendall Lowrey
Motoya Ohnishi
Wen Sun
101
119
0
22 Jun 2020
Safe Reinforcement Learning via Curriculum Induction
M. Turchetta
Andrey Kolobov
S. Shah
Andreas Krause
Alekh Agarwal
82
93
0
22 Jun 2020
On Reward-Free Reinforcement Learning with Linear Function Approximation
Ruosong Wang
S. Du
Lin F. Yang
Ruslan Salakhutdinov
OffRL
93
107
0
19 Jun 2020
An operator view of policy gradient methods
Dibya Ghosh
Marlos C. Machado
Nicolas Le Roux
OffRL
58
27
0
19 Jun 2020
Stochastic Optimization for Performative Prediction
Celestine Mendler-Dünner
Juan C. Perdomo
Tijana Zrnic
Moritz Hardt
75
115
0
12 Jun 2020
Exploration by Maximizing Rényi Entropy for Reward-Free RL Framework
Wei Shen
Yuanying Cai
Longbo Huang
Jian Li
OffRL
82
1
0
11 Jun 2020
Meta-Learning Bandit Policies by Gradient Ascent
Branislav Kveton
Martin Mladenov
Chih-Wei Hsu
Manzil Zaheer
Csaba Szepesvári
Craig Boutilier
76
9
0
09 Jun 2020
A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning
Sihan Zeng
Aqeel Anwar
Thinh T. Doan
A. Raychowdhury
Justin Romberg
90
40
0
08 Jun 2020
On the Global Convergence Rates of Softmax Policy Gradient Methods
Jincheng Mei
Chenjun Xiao
Csaba Szepesvári
Dale Schuurmans
192
294
0
13 May 2020
Reinforcement Learning with Feedback Graphs
Christoph Dann
Yishay Mansour
M. Mohri
Ayush Sekhari
Karthik Sridharan
53
9
0
07 May 2020
Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms
Tengyu Xu
Zhe Wang
Yingbin Liang
104
58
0
07 May 2020
Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms
Tengyu Xu
Zhe Wang
Yingbin Liang
89
25
0
27 Apr 2020
A Game Theoretic Framework for Model Based Reinforcement Learning
Aravind Rajeswaran
Igor Mordatch
Vikash Kumar
OffRL
67
128
0
16 Apr 2020
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
Justin Fu
Aviral Kumar
Ofir Nachum
George Tucker
Sergey Levine
GP
OffRL
273
1,387
0
15 Apr 2020
Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning
Fei Feng
Ruosong Wang
W. Yin
S. Du
Lin F. Yang
OffRL
SSL
93
7
0
15 Mar 2020
Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison
Tengyang Xie
Nan Jiang
171
35
0
09 Mar 2020
Generative Adversarial Imitation Learning with Neural Networks: Global Optimality and Convergence Rate
Yufeng Zhang
Qi Cai
Zhuoran Yang
Zhaoran Wang
223
12
0
08 Mar 2020
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss
Shuang Qiu
Xiaohan Wei
Zhuoran Yang
Jieping Ye
Zhaoran Wang
183
50
0
02 Mar 2020
Policy-Aware Model Learning for Policy Gradient Methods
Romina Abachi
Mohammad Ghavamzadeh
Amir-massoud Farahmand
79
36
0
28 Feb 2020
Optimistic Policy Optimization with Bandit Feedback
Yonathan Efroni
Lior Shani
Aviv A. Rosenberg
Shie Mannor
103
90
0
19 Feb 2020
Differentiable Bandit Exploration
Craig Boutilier
Chih-Wei Hsu
Branislav Kveton
Martin Mladenov
Csaba Szepesvári
Manzil Zaheer
BDL
OffRL
59
7
0
17 Feb 2020
Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling
Huaqing Xiong
Tengyu Xu
Yingbin Liang
Wei Zhang
86
33
0
15 Feb 2020
Previous
1
2
3
4
5
Next