ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.00261
  4. Cited By
On the Theory of Policy Gradient Methods: Optimality, Approximation, and
  Distribution Shift
v1v2v3v4v5 (latest)

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift

1 August 2019
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
ArXiv (abs)PDFHTML

Papers citing "On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift"

50 / 222 papers shown
Title
A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning
A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning
Mizhaan Prajit Maniyar
Akash Mondal
Prashanth L.A.
S. Bhatnagar
78
1
0
21 Apr 2023
Optimal Interpretability-Performance Trade-off of Classification Trees
  with Black-Box Reinforcement Learning
Optimal Interpretability-Performance Trade-off of Classification Trees with Black-Box Reinforcement Learning
Hector Kohler
R. Akrour
Philippe Preux
OffRL
65
0
0
11 Apr 2023
Connected Superlevel Set in (Deep) Reinforcement Learning and its
  Application to Minimax Theorems
Connected Superlevel Set in (Deep) Reinforcement Learning and its Application to Minimax Theorems
Sihan Zeng
Thinh T. Doan
Justin Romberg
OffRL
71
3
0
23 Mar 2023
Policy Mirror Descent Inherently Explores Action Space
Policy Mirror Descent Inherently Explores Action Space
Yan Li
Guanghui Lan
OffRL
132
8
0
08 Mar 2023
Sampling-based Exploration for Reinforcement Learning of Dexterous
  Manipulation
Sampling-based Exploration for Reinforcement Learning of Dexterous Manipulation
Gagan Khandate
Siqi Shang
Eric Chang
Tristan L. Saidi
Yang Liu
Seth Matthew Dennis
Johnson Adams
M. Ciocarlie
108
32
0
06 Mar 2023
Revisiting LQR Control from the Perspective of Receding-Horizon Policy
  Gradient
Revisiting LQR Control from the Perspective of Receding-Horizon Policy Gradient
Xiangyuan Zhang
Tamer Basar
84
20
0
25 Feb 2023
Best of Both Worlds Policy Optimization
Best of Both Worlds Policy Optimization
Christoph Dann
Chen-Yu Wei
Julian Zimmert
103
12
0
18 Feb 2023
Breaking the Curse of Multiagents in a Large State Space: RL in Markov
  Games with Independent Linear Function Approximation
Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation
Qiwen Cui
Jianchao Tan
S. Du
129
24
0
07 Feb 2023
A Novel Framework for Policy Mirror Descent with General
  Parameterization and Linear Convergence
A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence
Carlo Alfano
Rui Yuan
Patrick Rebeschini
158
15
0
30 Jan 2023
Beyond Exponentially Fast Mixing in Average-Reward Reinforcement
  Learning via Multi-Level Monte Carlo Actor-Critic
Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic
Wesley A Suttle
Amrit Singh Bedi
Bhrij Patel
Brian M Sadler
Alec Koppel
Dinesh Manocha
102
16
0
28 Jan 2023
Scalable and Sample Efficient Distributed Policy Gradient Algorithms in
  Multi-Agent Networked Systems
Scalable and Sample Efficient Distributed Policy Gradient Algorithms in Multi-Agent Networked Systems
Xin Liu
Honghao Wei
Lei Ying
125
6
0
13 Dec 2022
Global Convergence of Localized Policy Iteration in Networked
  Multi-Agent Reinforcement Learning
Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning
Yizhou Zhang
Guannan Qu
Pan Xu
Yiheng Lin
Zaiwei Chen
Adam Wierman
96
26
0
30 Nov 2022
On the Global Convergence of Fitted Q-Iteration with Two-layer Neural
  Network Parametrization
On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization
Mudit Gaur
Vaneet Aggarwal
Mridul Agarwal
MLT
113
1
0
14 Nov 2022
Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning
  with Parameter Convergence
Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence
S. Pattathil
Jianchao Tan
Asuman Ozdaglar
99
14
0
23 Oct 2022
Finite-time analysis of single-timescale actor-critic
Finite-time analysis of single-timescale actor-critic
Xu-yang Chen
Lin Zhao
OffRL
89
24
0
18 Oct 2022
On the convergence of policy gradient methods to Nash equilibria in
  general stochastic games
On the convergence of policy gradient methods to Nash equilibria in general stochastic games
Angeliki Giannou
Kyriakos Lotidis
P. Mertikopoulos
Emmanouil-Vasileios Vlatakis-Gkaragkounis
127
18
0
17 Oct 2022
Decentralized Policy Gradient for Nash Equilibria Learning of
  General-sum Stochastic Games
Decentralized Policy Gradient for Nash Equilibria Learning of General-sum Stochastic Games
Yan Chen
Taoying Li
65
2
0
14 Oct 2022
Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient
Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient
Yuda Song
Yi Zhou
Ayush Sekhari
J. Andrew Bagnell
A. Krishnamurthy
Wen Sun
OffRLOnRL
115
105
0
13 Oct 2022
Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
  Markov Games
Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games
Shicong Cen
Yuejie Chi
S. Du
Lin Xiao
136
38
0
03 Oct 2022
Distributionally Robust Offline Reinforcement Learning with Linear
  Function Approximation
Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation
Xiaoteng Ma
Zhipeng Liang
Jose H. Blanchet
MingWen Liu
Li Xia
Jiheng Zhang
Qianchuan Zhao
Zhengyuan Zhou
OODOffRL
103
26
0
14 Sep 2022
Efficiently Computing Nash Equilibria in Adversarial Team Markov Games
Efficiently Computing Nash Equilibria in Adversarial Team Markov Games
Fivos Kalogiannis
Ioannis Anagnostides
Ioannis Panageas
Emmanouil-Vasileios Vlatakis-Gkaragkounis
Vaggos Chatziafratis
S. Stavroulakis
73
13
0
03 Aug 2022
Understanding Adversarial Imitation Learning in Small Sample Regime: A
  Stage-coupled Analysis
Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis
Tian Xu
Ziniu Li
Yang Yu
Zhimin Luo
67
8
0
03 Aug 2022
Boosted Off-Policy Learning
Boosted Off-Policy Learning
Ben London
Levi Lu
Ted Sandler
Thorsten Joachims
OffRL
103
4
0
01 Aug 2022
Actor-Critic based Improper Reinforcement Learning
Actor-Critic based Improper Reinforcement Learning
Mohammadi Zaki
Avinash Mohan
Aditya Gopalan
Shie Mannor
84
3
0
19 Jul 2022
Minimum Description Length Control
Minimum Description Length Control
Theodore H. Moskovitz
Ta-Chu Kao
M. Sahani
M. Botvinick
80
1
0
17 Jul 2022
A Single-Timescale Analysis For Stochastic Approximation With Multiple
  Coupled Sequences
A Single-Timescale Analysis For Stochastic Approximation With Multiple Coupled Sequences
Han Shen
Tianyi Chen
127
15
0
21 Jun 2022
Achieving Zero Constraint Violation for Constrained Reinforcement
  Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm
Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm
Qinbo Bai
Amrit Singh Bedi
Vaneet Aggarwal
104
24
0
12 Jun 2022
Finite-Time Analysis of Fully Decentralized Single-Timescale
  Actor-Critic
Finite-Time Analysis of Fully Decentralized Single-Timescale Actor-Critic
Qijun Luo
Xiao Li
119
1
0
12 Jun 2022
Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective
  Reinforcement Learning
Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning
Ruida Zhou
Tao-Wen Liu
D. Kalathil
P. R. Kumar
Chao Tian
78
15
0
10 Jun 2022
Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games
Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games
Sihan Zeng
Thinh T. Doan
Justin Romberg
154
22
0
27 May 2022
Independent Natural Policy Gradient Methods for Potential Games:
  Finite-time Global Convergence with Entropy Regularization
Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization
Shicong Cen
Fan Chen
Yuejie Chi
100
15
0
12 Apr 2022
Accelerating Primal-dual Methods for Regularized Markov Decision
  Processes
Accelerating Primal-dual Methods for Regularized Markov Decision Processes
Haoya Li
Hsiang-Fu Yu
Lexing Ying
Inderjit Dhillon
82
4
0
21 Feb 2022
A Globally Convergent Evolutionary Strategy for Stochastic Constrained
  Optimization with Applications to Reinforcement Learning
A Globally Convergent Evolutionary Strategy for Stochastic Constrained Optimization with Applications to Reinforcement Learning
Youssef Diouane
Aurelien Lucchi
Vihang Patil
93
3
0
21 Feb 2022
Beyond the Policy Gradient Theorem for Efficient Policy Updates in
  Actor-Critic Algorithms
Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms
Romain Laroche
Rémi Tachet des Combes
97
2
0
15 Feb 2022
Uncovering Instabilities in Variational-Quantum Deep Q-Networks
Uncovering Instabilities in Variational-Quantum Deep Q-Networks
Maja Franz
Lucas Wolf
Maniraman Periyasamy
Christian Ufrecht
Daniel D. Scherer
Axel Plinge
Christopher Mutschler
Wolfgang Mauerer
135
30
0
10 Feb 2022
Off-Policy Fitted Q-Evaluation with Differentiable Function
  Approximators: Z-Estimation and Inference Theory
Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory
Ruiqi Zhang
Xuezhou Zhang
Chengzhuo Ni
Mengdi Wang
OffRL
92
16
0
10 Feb 2022
On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces
On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces
Amrit Singh Bedi
Souradip Chakraborty
Anjaly Parayil
Brian M Sadler
Pratap Tokekar
Alec Koppel
149
17
0
28 Jan 2022
Occupancy Information Ratio: Infinite-Horizon, Information-Directed,
  Parameterized Policy Search
Occupancy Information Ratio: Infinite-Horizon, Information-Directed, Parameterized Policy Search
Wesley A Suttle
Alec Koppel
Ji Liu
73
0
0
21 Jan 2022
Convergence of Policy Gradient for Entropy Regularized MDPs with Neural
  Network Approximation in the Mean-Field Regime
Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime
B. Kerimkulov
J. Leahy
David Siska
Lukasz Szpruch
101
14
0
18 Jan 2022
Block Policy Mirror Descent
Block Policy Mirror Descent
Guanghui Lan
Yan Li
T. Zhao
OffRL
95
10
0
15 Jan 2022
A Free Lunch from the Noise: Provable and Practical Exploration for
  Representation Learning
A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning
Zhaolin Ren
Tianjun Zhang
Csaba Szepesvári
Bo Dai
117
20
0
22 Nov 2021
Towards an Understanding of Default Policies in Multitask Policy
  Optimization
Towards an Understanding of Default Policies in Multitask Policy Optimization
Theodore H. Moskovitz
Michael Arbel
Jack Parker-Holder
Aldo Pacchiano
70
10
0
04 Nov 2021
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Shangtong Zhang
Rémi Tachet des Combes
Romain Laroche
114
12
0
04 Nov 2021
Policy Optimization for Constrained MDPs with Provable Fast Global
  Convergence
Policy Optimization for Constrained MDPs with Provable Fast Global Convergence
Tao-Wen Liu
Ruida Zhou
D. Kalathil
P. R. Kumar
Chao Tian
75
21
0
31 Oct 2021
Convergence and Optimality of Policy Gradient Methods in Weakly Smooth
  Settings
Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings
Matthew Shunshi Zhang
Murat A. Erdogdu
Animesh Garg
91
5
0
30 Oct 2021
Understanding the Effect of Stochasticity in Policy Optimization
Understanding the Effect of Stochasticity in Policy Optimization
Jincheng Mei
Bo Dai
Chenjun Xiao
Csaba Szepesvári
Dale Schuurmans
93
19
0
29 Oct 2021
Neural PPO-Clip Attains Global Optimality: A Hinge Loss Perspective
Neural PPO-Clip Attains Global Optimality: A Hinge Loss Perspective
Nai-Chieh Huang
Ping-Chun Hsieh
Kuo-Hao Ho
Hsuan-Yu Yao
Kai-Chun Hu
Liang-Chun Ouyang
I-Chen Wu
103
1
0
26 Oct 2021
Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic
  Algorithm for Constrained Markov Decision Processes
Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes
Sihan Zeng
Thinh T. Doan
Justin Romberg
183
18
0
21 Oct 2021
Independent Natural Policy Gradient Always Converges in Markov Potential
  Games
Independent Natural Policy Gradient Always Converges in Markov Potential Games
Roy Fox
Stephen Marcus McAleer
W. Overman
Ioannis Panageas
90
49
0
20 Oct 2021
Faster Algorithm and Sharper Analysis for Constrained Markov Decision
  Process
Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process
Tianjiao Li
Ziwei Guan
Shaofeng Zou
Tengyu Xu
Yingbin Liang
Guanghui Lan
80
30
0
20 Oct 2021
Previous
12345
Next