ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.00261
  4. Cited By
On the Theory of Policy Gradient Methods: Optimality, Approximation, and
  Distribution Shift
v1v2v3v4v5 (latest)

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift

1 August 2019
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
ArXiv (abs)PDFHTML

Papers citing "On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift"

50 / 222 papers shown
Title
Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can
  be Exponentially Harder than Online RL
Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL
Andrea Zanette
OffRL
250
71
0
14 Dec 2020
Sample Complexity of Policy Gradient Finding Second-Order Stationary
  Points
Sample Complexity of Policy Gradient Finding Second-Order Stationary Points
Long Yang
Qian Zheng
Gang Pan
109
21
0
02 Dec 2020
CRPO: A New Approach for Safe Reinforcement Learning with Convergence
  Guarantee
CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee
Tengyu Xu
Yingbin Liang
Guanghui Lan
123
128
0
11 Nov 2020
A Study of Policy Gradient on a Class of Exactly Solvable Models
A Study of Policy Gradient on a Class of Exactly Solvable Models
Gavin McCracken
Colin Daniels
Rosie Zhao
Anna M. Brandenberger
Prakash Panangaden
Doina Precup
49
0
0
03 Nov 2020
Finding the Near Optimal Policy via Adaptive Reduced Regularization in
  MDPs
Finding the Near Optimal Policy via Adaptive Reduced Regularization in MDPs
Wenhao Yang
Xiang Li
Guangzeng Xie
Zhihua Zhang
104
5
0
31 Oct 2020
Conservative Safety Critics for Exploration
Conservative Safety Critics for Exploration
Homanga Bharadhwaj
Aviral Kumar
Nicholas Rhinehart
Sergey Levine
Florian Shkurti
Animesh Garg
OffRL
119
139
0
27 Oct 2020
Global optimality of softmax policy gradient with single hidden layer
  neural networks in the mean-field regime
Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime
Andrea Agazzi
Jianfeng Lu
91
16
0
22 Oct 2020
Optimising Stochastic Routing for Taxi Fleets with Model Enhanced
  Reinforcement Learning
Optimising Stochastic Routing for Taxi Fleets with Model Enhanced Reinforcement Learning
Shen Ren
Qianxiao Li
Liye Zhang
Zheng Qin
Bo Yang
30
0
0
22 Oct 2020
Sample Efficient Reinforcement Learning with REINFORCE
Sample Efficient Reinforcement Learning with REINFORCE
Junzi Zhang
Jongho Kim
Brendan O'Donoghue
Stephen P. Boyd
137
113
0
22 Oct 2020
Logistic Q-Learning
Logistic Q-Learning
Joan Bas-Serrano
Sebastian Curi
Andreas Krause
Gergely Neu
111
40
0
21 Oct 2020
Provable Fictitious Play for General Mean-Field Games
Provable Fictitious Play for General Mean-Field Games
Qiaomin Xie
Zhuoran Yang
Zhaoran Wang
Andreea Minca
86
18
0
08 Oct 2020
A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms
A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms
Shangtong Zhang
Romain Laroche
H. V. Seijen
Shimon Whiteson
Rémi Tachet des Combes
135
15
0
02 Oct 2020
Revisiting Design Choices in Proximal Policy Optimization
Revisiting Design Choices in Proximal Policy Optimization
Chloe Ching-Yun Hsu
Celestine Mendler-Dünner
Moritz Hardt
169
57
0
23 Sep 2020
Beyond variance reduction: Understanding the true impact of baselines on
  policy optimization
Beyond variance reduction: Understanding the true impact of baselines on policy optimization
Wesley Chung
Valentin Thomas
Marlos C. Machado
Nicolas Le Roux
OffRL
115
24
0
31 Aug 2020
On the Sample Complexity of Reinforcement Learning with Policy Space
  Generalization
On the Sample Complexity of Reinforcement Learning with Policy Space Generalization
Wenlong Mou
Zheng Wen
Xi Chen
81
11
0
17 Aug 2020
Reinforcement Learning with Trajectory Feedback
Reinforcement Learning with Trajectory Feedback
Yonathan Efroni
Nadav Merlis
Shie Mannor
116
45
0
13 Aug 2020
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy
Zuyue Fu
Zhuoran Yang
Zhaoran Wang
91
43
0
02 Aug 2020
Learning Infinite-horizon Average-reward MDPs with Linear Function
  Approximation
Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
Chen-Yu Wei
Mehdi Jafarnia-Jahromi
Haipeng Luo
Rahul Jain
94
43
0
23 Jul 2020
Approximation Benefits of Policy Gradient Methods with Aggregated States
Approximation Benefits of Policy Gradient Methods with Aggregated States
Daniel Russo
125
7
0
22 Jul 2020
On Linear Convergence of Policy Gradient Methods for Finite MDPs
On Linear Convergence of Policy Gradient Methods for Finite MDPs
Jalaj Bhandari
Daniel Russo
134
61
0
21 Jul 2020
A Short Note on Soft-max and Policy Gradients in Bandits Problems
A Short Note on Soft-max and Policy Gradients in Bandits Problems
N. Walton
59
1
0
20 Jul 2020
Regret Analysis of a Markov Policy Gradient Algorithm for Multi-arm
  Bandits
Regret Analysis of a Markov Policy Gradient Algorithm for Multi-arm Bandits
D. Denisov
N. Walton
83
8
0
20 Jul 2020
Provably Good Batch Reinforcement Learning Without Great Exploration
Provably Good Batch Reinforcement Learning Without Great Exploration
Yao Liu
Adith Swaminathan
Alekh Agarwal
Emma Brunskill
OffRL
206
105
0
16 Jul 2020
Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal
  Sample Complexity
Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity
Jianchao Tan
Sham Kakade
Tamer Bacsar
Lin F. Yang
151
123
0
15 Jul 2020
Variational Policy Gradient Method for Reinforcement Learning with
  General Utilities
Variational Policy Gradient Method for Reinforcement Learning with General Utilities
Junyu Zhang
Alec Koppel
Amrit Singh Bedi
Csaba Szepesvári
Mengdi Wang
91
140
0
04 Jul 2020
Deep Bayesian Quadrature Policy Optimization
Deep Bayesian Quadrature Policy Optimization
Akella Ravi Tej
Kamyar Azizzadenesheli
Mohammad Ghavamzadeh
Anima Anandkumar
Yisong Yue
77
5
0
28 Jun 2020
When Will Generative Adversarial Imitation Learning Algorithms Attain
  Global Convergence
When Will Generative Adversarial Imitation Learning Algorithms Attain Global Convergence
Ziwei Guan
Tengyu Xu
Yingbin Liang
85
16
0
24 Jun 2020
Ecological Reinforcement Learning
Ecological Reinforcement Learning
John D. Co-Reyes
Suvansh Sanjeev
Glen Berseth
Abhishek Gupta
Sergey Levine
OffRL
107
23
0
22 Jun 2020
Information Theoretic Regret Bounds for Online Nonlinear Control
Information Theoretic Regret Bounds for Online Nonlinear Control
Sham Kakade
A. Krishnamurthy
Kendall Lowrey
Motoya Ohnishi
Wen Sun
101
119
0
22 Jun 2020
Safe Reinforcement Learning via Curriculum Induction
Safe Reinforcement Learning via Curriculum Induction
M. Turchetta
Andrey Kolobov
S. Shah
Andreas Krause
Alekh Agarwal
82
93
0
22 Jun 2020
On Reward-Free Reinforcement Learning with Linear Function Approximation
On Reward-Free Reinforcement Learning with Linear Function Approximation
Ruosong Wang
S. Du
Lin F. Yang
Ruslan Salakhutdinov
OffRL
93
107
0
19 Jun 2020
An operator view of policy gradient methods
An operator view of policy gradient methods
Dibya Ghosh
Marlos C. Machado
Nicolas Le Roux
OffRL
58
27
0
19 Jun 2020
Stochastic Optimization for Performative Prediction
Stochastic Optimization for Performative Prediction
Celestine Mendler-Dünner
Juan C. Perdomo
Tijana Zrnic
Moritz Hardt
75
115
0
12 Jun 2020
Exploration by Maximizing Rényi Entropy for Reward-Free RL Framework
Exploration by Maximizing Rényi Entropy for Reward-Free RL Framework
Wei Shen
Yuanying Cai
Longbo Huang
Jian Li
OffRL
82
1
0
11 Jun 2020
Meta-Learning Bandit Policies by Gradient Ascent
Meta-Learning Bandit Policies by Gradient Ascent
Branislav Kveton
Martin Mladenov
Chih-Wei Hsu
Manzil Zaheer
Csaba Szepesvári
Craig Boutilier
76
9
0
09 Jun 2020
A Decentralized Policy Gradient Approach to Multi-task Reinforcement
  Learning
A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning
Sihan Zeng
Aqeel Anwar
Thinh T. Doan
A. Raychowdhury
Justin Romberg
90
40
0
08 Jun 2020
On the Global Convergence Rates of Softmax Policy Gradient Methods
On the Global Convergence Rates of Softmax Policy Gradient Methods
Jincheng Mei
Chenjun Xiao
Csaba Szepesvári
Dale Schuurmans
192
294
0
13 May 2020
Reinforcement Learning with Feedback Graphs
Reinforcement Learning with Feedback Graphs
Christoph Dann
Yishay Mansour
M. Mohri
Ayush Sekhari
Karthik Sridharan
53
9
0
07 May 2020
Non-asymptotic Convergence Analysis of Two Time-scale (Natural)
  Actor-Critic Algorithms
Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms
Tengyu Xu
Zhe Wang
Yingbin Liang
104
58
0
07 May 2020
Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms
Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms
Tengyu Xu
Zhe Wang
Yingbin Liang
89
25
0
27 Apr 2020
A Game Theoretic Framework for Model Based Reinforcement Learning
A Game Theoretic Framework for Model Based Reinforcement Learning
Aravind Rajeswaran
Igor Mordatch
Vikash Kumar
OffRL
67
128
0
16 Apr 2020
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
Justin Fu
Aviral Kumar
Ofir Nachum
George Tucker
Sergey Levine
GPOffRL
273
1,387
0
15 Apr 2020
Provably Efficient Exploration for Reinforcement Learning Using
  Unsupervised Learning
Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning
Fei Feng
Ruosong Wang
W. Yin
S. Du
Lin F. Yang
OffRLSSL
93
7
0
15 Mar 2020
Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical
  Comparison
Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison
Tengyang Xie
Nan Jiang
171
35
0
09 Mar 2020
Generative Adversarial Imitation Learning with Neural Networks: Global
  Optimality and Convergence Rate
Generative Adversarial Imitation Learning with Neural Networks: Global Optimality and Convergence Rate
Yufeng Zhang
Qi Cai
Zhuoran Yang
Zhaoran Wang
223
12
0
08 Mar 2020
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
  Adversarial Loss
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss
Shuang Qiu
Xiaohan Wei
Zhuoran Yang
Jieping Ye
Zhaoran Wang
183
50
0
02 Mar 2020
Policy-Aware Model Learning for Policy Gradient Methods
Policy-Aware Model Learning for Policy Gradient Methods
Romina Abachi
Mohammad Ghavamzadeh
Amir-massoud Farahmand
79
36
0
28 Feb 2020
Optimistic Policy Optimization with Bandit Feedback
Optimistic Policy Optimization with Bandit Feedback
Yonathan Efroni
Lior Shani
Aviv A. Rosenberg
Shie Mannor
103
90
0
19 Feb 2020
Differentiable Bandit Exploration
Differentiable Bandit Exploration
Craig Boutilier
Chih-Wei Hsu
Branislav Kveton
Martin Mladenov
Csaba Szepesvári
Manzil Zaheer
BDLOffRL
59
7
0
17 Feb 2020
Non-asymptotic Convergence of Adam-type Reinforcement Learning
  Algorithms under Markovian Sampling
Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling
Huaqing Xiong
Tengyu Xu
Yingbin Liang
Wei Zhang
86
33
0
15 Feb 2020
Previous
12345
Next