ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.00261
  4. Cited By
On the Theory of Policy Gradient Methods: Optimality, Approximation, and
  Distribution Shift
v1v2v3v4v5 (latest)

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift

1 August 2019
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
ArXiv (abs)PDFHTML

Papers citing "On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift"

50 / 222 papers shown
Title
Provable RL with Exogenous Distractors via Multistep Inverse Dynamics
Provable RL with Exogenous Distractors via Multistep Inverse Dynamics
Yonathan Efroni
Dipendra Kumar Misra
A. Krishnamurthy
Alekh Agarwal
John Langford
OffRL
96
23
0
17 Oct 2021
Learning to Coordinate in Multi-Agent Systems: A Coordinated
  Actor-Critic Algorithm and Finite-Time Guarantees
Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees
Siliang Zeng
Tianyi Chen
Alfredo García
Mingyi Hong
100
11
0
11 Oct 2021
Satisficing Paths and Independent Multi-Agent Reinforcement Learning in
  Stochastic Games
Satisficing Paths and Independent Multi-Agent Reinforcement Learning in Stochastic Games
Bora Yongacoglu
Gürdal Arslan
S. Yüksel
69
16
0
09 Oct 2021
Approximate Newton policy gradient algorithms
Approximate Newton policy gradient algorithms
Haoya Li
Samarth Gupta
Hsiangfu Yu
Lexing Ying
Inderjit Dhillon
93
3
0
05 Oct 2021
Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates
Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates
Romain Laroche
Rémi Tachet des Combes
102
8
0
29 Sep 2021
Dimension-Free Rates for Natural Policy Gradient in Multi-Agent
  Reinforcement Learning
Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning
Carlo Alfano
Patrick Rebeschini
93
5
0
23 Sep 2021
Reinforcement Learning for Load-balanced Parallel Particle Tracing
Reinforcement Learning for Load-balanced Parallel Particle Tracing
Jiayi Xu
Hanqi Guo
Han-Wei Shen
Mukund Raj
Skylar W. Wurster
Tom Peterka
32
6
0
13 Sep 2021
Provable Benefits of Actor-Critic Methods for Offline Reinforcement
  Learning
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Andrea Zanette
Martin J. Wainwright
Emma Brunskill
OffRL
104
119
0
19 Aug 2021
Global Convergence of the ODE Limit for Online Actor-Critic Algorithms
  in Reinforcement Learning
Global Convergence of the ODE Limit for Online Actor-Critic Algorithms in Reinforcement Learning
Ziheng Wang
Justin A. Sirignano
82
2
0
19 Aug 2021
A general class of surrogate functions for stable and efficient
  reinforcement learning
A general class of surrogate functions for stable and efficient reinforcement learning
Sharan Vaswani
Olivier Bachem
Simone Totaro
Robert Mueller
Shivam Garg
Matthieu Geist
Marlos C. Machado
Pablo Samuel Castro
Nicolas Le Roux
OffRL
124
16
0
12 Aug 2021
Variational Actor-Critic Algorithms
Variational Actor-Critic Algorithms
Yuhua Zhu
Lexing Ying
OffRL
72
0
0
03 Aug 2021
Policy Optimization in Adversarial MDPs: Improved Exploration via
  Dilated Bonuses
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
Haipeng Luo
Chen-Yu Wei
Chung-Wei Lee
124
45
0
18 Jul 2021
Pessimistic Model-based Offline Reinforcement Learning under Partial
  Coverage
Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage
Masatoshi Uehara
Wen Sun
OffRL
202
150
0
13 Jul 2021
Cautious Policy Programming: Exploiting KL Regularization in Monotonic
  Policy Improvement for Reinforcement Learning
Cautious Policy Programming: Exploiting KL Regularization in Monotonic Policy Improvement for Reinforcement Learning
Lingwei Zhu
Toshinori Kitamura
Takamitsu Matsubara
OffRL
59
1
0
13 Jul 2021
Curious Explorer: a provable exploration strategy in Policy Learning
Curious Explorer: a provable exploration strategy in Policy Learning
M. Miani
Maurizio Parton
M. Romito
131
0
0
29 Jun 2021
Tighter Analysis of Alternating Stochastic Gradient Method for
  Stochastic Nested Problems
Tighter Analysis of Alternating Stochastic Gradient Method for Stochastic Nested Problems
Tianyi Chen
Yuejiao Sun
W. Yin
84
33
0
25 Jun 2021
Unifying Gradient Estimators for Meta-Reinforcement Learning via
  Off-Policy Evaluation
Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation
Yunhao Tang
Tadashi Kozuno
Mark Rowland
Rémi Munos
Michal Valko
OffRL
136
9
0
24 Jun 2021
On the Sample Complexity and Metastability of Heavy-tailed Policy Search
  in Continuous Control
On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control
Amrit Singh Bedi
Anjaly Parayil
Junyu Zhang
Mengdi Wang
Alec Koppel
90
15
0
15 Jun 2021
Sample Efficient Reinforcement Learning In Continuous State Spaces: A
  Perspective Beyond Linearity
Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity
Dhruv Malik
Aldo Pacchiano
Vishwak Srinivasan
Yuanzhi Li
57
6
0
15 Jun 2021
On-Policy Deep Reinforcement Learning for the Average-Reward Criterion
On-Policy Deep Reinforcement Learning for the Average-Reward Criterion
Yiming Zhang
George Andriopoulos
OffRL
97
43
0
14 Jun 2021
Markov Decision Processes with Long-Term Average Constraints
Markov Decision Processes with Long-Term Average Constraints
Mridul Agarwal
Qinbo Bai
Vaneet Aggarwal
56
6
0
12 Jun 2021
Linear Convergence of Entropy-Regularized Natural Policy Gradient with
  Linear Function Approximation
Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation
Semih Cayci
Niao He
R. Srikant
113
36
0
08 Jun 2021
Global Convergence of Multi-Agent Policy Gradient in Markov Potential
  Games
Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games
Stefanos Leonardos
W. Overman
Ioannis Panageas
Georgios Piliouras
109
123
0
03 Jun 2021
On the Convergence Rate of Off-Policy Policy Optimization Methods with
  Density-Ratio Correction
On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction
Jiawei Huang
Nan Jiang
100
5
0
02 Jun 2021
Reward is enough for convex MDPs
Reward is enough for convex MDPs
Tom Zahavy
Brendan O'Donoghue
Guillaume Desjardins
Satinder Singh
141
76
0
01 Jun 2021
Joint Optimization of Multi-Objective Reinforcement Learning with Policy
  Gradient Based Algorithm
Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm
Qinbo Bai
Mridul Agarwal
Vaneet Aggarwal
58
7
0
28 May 2021
Policy Mirror Descent for Regularized Reinforcement Learning: A
  Generalized Framework with Linear Convergence
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence
Wenhao Zhan
Shicong Cen
Baihe Huang
Yuxin Chen
Jason D. Lee
Yuejie Chi
100
78
0
24 May 2021
Sample-Efficient Reinforcement Learning Is Feasible for Linearly
  Realizable MDPs with Limited Revisiting
Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting
Gen Li
Yuxin Chen
Yuejie Chi
Yuantao Gu
Yuting Wei
OffRL
92
30
0
17 May 2021
Leveraging Non-uniformity in First-order Non-convex Optimization
Leveraging Non-uniformity in First-order Non-convex Optimization
Jincheng Mei
Yue Gao
Bo Dai
Csaba Szepesvári
Dale Schuurmans
92
50
0
13 May 2021
On the Linear convergence of Natural Policy Gradient Algorithm
On the Linear convergence of Natural Policy Gradient Algorithm
S. Khodadadian
P. Jhunjhunwala
Sushil Mahavir Varma
S. T. Maguluri
95
57
0
04 May 2021
Cautiously Optimistic Policy Optimization and Exploration with Linear
  Function Approximation
Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation
Andrea Zanette
Ching-An Cheng
Alekh Agarwal
122
53
0
24 Mar 2021
Bridging Offline Reinforcement Learning and Imitation Learning: A Tale
  of Pessimism
Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
Paria Rashidinejad
Banghua Zhu
Cong Ma
Jiantao Jiao
Stuart J. Russell
OffRL
303
291
0
22 Mar 2021
Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme
Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme
Konstantin Avrachenkov
Vivek Borkar
H. Dolhare
K. Patil
60
9
0
10 Mar 2021
State Augmented Constrained Reinforcement Learning: Overcoming the
  Limitations of Learning with Rewards
State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning with Rewards
Miguel Calvo-Fullana
Santiago Paternain
Luiz F. O. Chamon
Alejandro Ribeiro
OffRL
86
34
0
23 Feb 2021
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
Tengyu Xu
Zhuoran Yang
Zhaoran Wang
Yingbin Liang
OffRL
127
25
0
23 Feb 2021
Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement
  Learning via Frank-Wolfe Policy Optimization
Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement Learning via Frank-Wolfe Policy Optimization
Jyun-Li Lin
Wei-Ting Hung
Shangtong Yang
Ping-Chun Hsieh
Xi Liu
122
14
0
22 Feb 2021
Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov
  Games
Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games
Yulai Zhao
Yuandong Tian
Jason D. Lee
S. Du
OffRL
76
18
0
17 Feb 2021
On the Convergence and Sample Efficiency of Variance-Reduced Policy
  Gradient Method
On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method
Junyu Zhang
Chengzhuo Ni
Zheng Yu
Csaba Szepesvári
Mengdi Wang
133
69
0
17 Feb 2021
Improper Reinforcement Learning with Gradient-based Policy Optimization
Improper Reinforcement Learning with Gradient-based Policy Optimization
Mohammadi Zaki
Avinash Mohan
Aditya Gopalan
Shie Mannor
44
0
0
16 Feb 2021
Online Apprenticeship Learning
Online Apprenticeship Learning
Lior Shani
Tom Zahavy
Shie Mannor
OffRL
100
27
0
13 Feb 2021
Optimization Issues in KL-Constrained Approximate Policy Iteration
Optimization Issues in KL-Constrained Approximate Policy Iteration
N. Lazić
Botao Hao
Yasin Abbasi-Yadkori
Dale Schuurmans
Csaba Szepesvári
62
11
0
11 Feb 2021
Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve
  Optimism, Embrace Virtual Curvature
Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature
Kefan Dong
Jiaqi Yang
Tengyu Ma
102
33
0
08 Feb 2021
Multi-Agent Reinforcement Learning with Temporal Logic Specifications
Multi-Agent Reinforcement Learning with Temporal Logic Specifications
Lewis Hammond
Alessandro Abate
Julian Gutierrez
Michael Wooldridge
AI4CE
108
33
0
01 Feb 2021
Reinforcement Learning for Selective Key Applications in Power Systems:
  Recent Advances and Future Challenges
Reinforcement Learning for Selective Key Applications in Power Systems: Recent Advances and Future Challenges
Xin Chen
Guannan Qu
Yujie Tang
S. Low
Na Li
88
241
0
27 Jan 2021
Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm
Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm
S. Khodadadian
Thinh T. Doan
Justin Romberg
S. T. Maguluri
104
43
0
26 Jan 2021
Independent Policy Gradient Methods for Competitive Reinforcement
  Learning
Independent Policy Gradient Methods for Competitive Reinforcement Learning
C. Daskalakis
Dylan J. Foster
Noah Golowich
246
163
0
11 Jan 2021
Smoothed functional-based gradient algorithms for off-policy
  reinforcement learning: A non-asymptotic viewpoint
Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint
Nithia Vijayan
A. PrashanthL.
OffRL
77
7
0
06 Jan 2021
Robust Asymmetric Learning in POMDPs
Robust Asymmetric Learning in POMDPs
Andrew Warrington
J. Lavington
Adam Scibior
Mark Schmidt
Frank Wood
78
32
0
31 Dec 2020
Towards Understanding Asynchronous Advantage Actor-critic: Convergence
  and Linear Speedup
Towards Understanding Asynchronous Advantage Actor-critic: Convergence and Linear Speedup
Han Shen
Jianchao Tan
Min-Fong Hong
Tianyi Chen
90
30
0
31 Dec 2020
Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
  Globally Optimal Policy
Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy
Han Zhong
Xun Deng
Ethan X. Fang
Zhuoran Yang
Zhaoran Wang
Runze Li
71
3
0
28 Dec 2020
Previous
12345
Next