ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.00261
  4. Cited By
On the Theory of Policy Gradient Methods: Optimality, Approximation, and
  Distribution Shift
v1v2v3v4v5 (latest)

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift

Annual Conference Computational Learning Theory (COLT), 2019
1 August 2019
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
ArXiv (abs)PDFHTML

Papers citing "On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift"

50 / 225 papers shown
Towards Formalizing Reinforcement Learning Theory
Towards Formalizing Reinforcement Learning Theory
Shangtong Zhang
155
3
0
05 Nov 2025
Offline Reinforcement Learning in Large State Spaces: Algorithms and Guarantees
Offline Reinforcement Learning in Large State Spaces: Algorithms and Guarantees
Nan Jiang
Tengyang Xie
OffRL
239
16
0
05 Oct 2025
Sampling Complexity of TD and PPO in RKHS
Sampling Complexity of TD and PPO in RKHS
Lu Zou
Wendi Ren
Weizhong Zhang
Liang Ding
Shuang Li
156
1
0
29 Sep 2025
Proximal Point Nash Learning from Human Feedback
Proximal Point Nash Learning from Human Feedback
D. Tiapkin
Daniele Calandriello
Denis Belomestny
Eric Moulines
Alexey Naumov
Kashif Rasul
Michal Valko
Pierre Ménard
271
4
0
26 May 2025
Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach
Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach
Swetha Ganesh
Vaneet Aggarwal
276
6
0
26 May 2025
KL-regularization Itself is Differentially Private in Bandits and RLHF
KL-regularization Itself is Differentially Private in Bandits and RLHF
Yizhou Zhang
Kishan Panaganti
Laixi Shi
Juba Ziani
Adam Wierman
305
1
0
23 May 2025
Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Peter Chen
Xiaopeng Li
Zhiyu Li
Xi Chen
Tianyi Lin
604
0
0
16 May 2025
Infinite Horizon Markov Economies
Infinite Horizon Markov Economies
Denizalp Goktas
Sadie Zhao
Yiling Chen
Amy Greenwald
297
1
0
22 Feb 2025
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
683
18
0
07 Nov 2024
Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control
Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory controlCommunications in Transportation Research (CTR), 2024
Zihao Sheng
Zilin Huang
Sikai Chen
293
19
0
30 Aug 2024
Functional Acceleration for Policy Mirror Descent
Functional Acceleration for Policy Mirror Descent
Veronica Chelu
Doina Precup
379
2
0
23 Jul 2024
SAIL: Self-Improving Efficient Online Alignment of Large Language Models
SAIL: Self-Improving Efficient Online Alignment of Large Language Models
Mucong Ding
Souradip Chakraborty
Vibhu Agrawal
Zora Che
Alec Koppel
Mengdi Wang
Amrit Singh Bedi
Furong Huang
305
20
0
21 Jun 2024
Hybrid Reinforcement Learning from Offline Observation Alone
Hybrid Reinforcement Learning from Offline Observation Alone
Yuda Song
J. Andrew Bagnell
Aarti Singh
OffRL
348
6
0
11 Jun 2024
Enhancing Efficiency of Safe Reinforcement Learning via Sample
  Manipulation
Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation
Shangding Gu
Laixi Shi
Yuhao Ding
Alois Knoll
C. Spanos
Adam Wierman
Ming Jin
OffRL
326
9
0
31 May 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
499
10
0
30 May 2024
Momentum for the Win: Collaborative Federated Reinforcement Learning
  across Heterogeneous Environments
Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments
Han Wang
Sihong He
Zhili Zhang
Fei Miao
James Anderson
289
9
0
29 May 2024
Recurrent Natural Policy Gradient for POMDPs
Recurrent Natural Policy Gradient for POMDPs
Semih Cayci
A. Eryilmaz
379
3
0
28 May 2024
Natural Policy Gradient and Actor Critic Methods for Constrained
  Multi-Task Reinforcement Learning
Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement Learning
Sihan Zeng
Thinh T. Doan
Justin Romberg
238
0
0
03 May 2024
Towards Global Optimality for Practical Average Reward Reinforcement
  Learning without Mixing Time Oracles
Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time OraclesInternational Conference on Machine Learning (ICML), 2024
Bhrij Patel
Wesley A Suttle
Alec Koppel
Vaneet Aggarwal
Brian M Sadler
Amrit Singh Bedi
Dinesh Manocha
535
4
0
18 Mar 2024
Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical
  Systems
Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems
Wesley A Suttle
Vipul K Sharma
K. Kosaraju
S. Sivaranjani
Ji Liu
Vijay Gupta
Brian M Sadler
230
3
0
06 Mar 2024
Learning to Model Diverse Driving Behaviors in Highly Interactive
  Autonomous Driving Scenarios with Multi-Agent Reinforcement Learning
Learning to Model Diverse Driving Behaviors in Highly Interactive Autonomous Driving Scenarios with Multi-Agent Reinforcement Learning
Weiwei Liu
Wenxuan Hu
Wei Jing
Lanxin Lei
Lingping Gao
Yong Liu
272
9
0
21 Feb 2024
Principled Penalty-based Methods for Bilevel Reinforcement Learning and
  RLHF
Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHFInternational Conference on Machine Learning (ICML), 2024
Han Shen
Zhuoran Yang
Tianyi Chen
OffRL
431
33
0
10 Feb 2024
Behind the Myth of Exploration in Policy Gradients
Behind the Myth of Exploration in Policy Gradients
Adrien Bolland
Gaspard Lambrechts
Damien Ernst
467
3
0
31 Jan 2024
R$\times$R: Rapid eXploration for Reinforcement Learning via
  Sampling-based Reset Distributions and Imitation Pre-training
R×\times×R: Rapid eXploration for Reinforcement Learning via Sampling-based Reset Distributions and Imitation Pre-training
Gagan Khandate
Tristan L. Saidi
Siqi Shang
Eric T. Chang
Yang Liu
Seth Matthew Dennis
Johnson Adams
M. Ciocarlie
382
5
0
27 Jan 2024
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning:
  Theory, Algorithms and Implementations
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations
Matthias Lehmann
355
9
0
24 Jan 2024
On the Stochastic (Variance-Reduced) Proximal Gradient Method for
  Regularized Expected Reward Optimization
On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization
Ling Liang
Haizhao Yang
237
1
0
23 Jan 2024
PPO-Clip Attains Global Optimality: Towards Deeper Understandings of
  Clipping
PPO-Clip Attains Global Optimality: Towards Deeper Understandings of Clipping
Nai-Chieh Huang
Ping-Chun Hsieh
Kuo-Hao Ho
I-Chen Wu
313
30
0
19 Dec 2023
Fast Policy Learning for Linear Quadratic Control with Entropy Regularization
Fast Policy Learning for Linear Quadratic Control with Entropy Regularization
Xin Guo
Xinyu Li
Renyuan Xu
499
9
0
23 Nov 2023
On the Second-Order Convergence of Biased Policy Gradient Algorithms
On the Second-Order Convergence of Biased Policy Gradient AlgorithmsInternational Conference on Machine Learning (ICML), 2023
Siqiao Mu
Diego Klabjan
485
4
0
05 Nov 2023
Model-Based Reparameterization Policy Gradient Methods: Theory and
  Practical Algorithms
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical AlgorithmsNeural Information Processing Systems (NeurIPS), 2023
Shenao Zhang
Boyi Liu
Zhaoran Wang
Tuo Zhao
359
6
0
30 Oct 2023
Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm
  with General Parameterization for Infinite Horizon Discounted Reward Markov
  Decision Processes
Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision ProcessesInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Washim Uddin Mondal
Vaneet Aggarwal
328
23
0
18 Oct 2023
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method
  for Aligning Large Language Models
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language ModelsInternational Conference on Machine Learning (ICML), 2023
Ziniu Li
Tian Xu
Yushun Zhang
Zhihang Lin
Yang Yu
Tian Ding
Zhimin Luo
548
158
0
16 Oct 2023
Bi-Level Offline Policy Optimization with Limited Exploration
Bi-Level Offline Policy Optimization with Limited ExplorationNeural Information Processing Systems (NeurIPS), 2023
Wenzhuo Zhou
OffRL
316
5
0
10 Oct 2023
A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces
A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces
B. Kerimkulov
J. Leahy
David Siska
Lukasz Szpruch
Yufei Zhang
457
17
0
04 Oct 2023
On Representation Complexity of Model-based and Model-free Reinforcement
  Learning
On Representation Complexity of Model-based and Model-free Reinforcement LearningInternational Conference on Learning Representations (ICLR), 2023
Hanlin Zhu
Baihe Huang
Stuart Russell
OffRL
450
5
0
03 Oct 2023
Stackelberg Batch Policy Learning
Stackelberg Batch Policy Learning
Wenzhuo Zhou
Annie Qu
OffRL
334
1
0
28 Sep 2023
Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in
  IBMDPs
Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs
Hector Kohler
R. Akrour
Philippe Preux
OffRL
538
1
0
23 Sep 2023
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon
  Average Reward Markov Decision Processes
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision ProcessesAAAI Conference on Artificial Intelligence (AAAI), 2023
Qinbo Bai
Washim Uddin Mondal
Vaneet Aggarwal
404
23
0
05 Sep 2023
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning
  from Human Feedback
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human FeedbackInternational Conference on Learning Representations (ICLR), 2023
Souradip Chakraborty
Amrit Singh Bedi
Alec Koppel
Dinesh Manocha
Huazheng Wang
Mengdi Wang
Furong Huang
422
41
0
03 Aug 2023
Learning to Generate Better Than Your LLM
Learning to Generate Better Than Your LLM
Jonathan D. Chang
Kianté Brantley
Rajkumar Ramamurthy
Dipendra Kumar Misra
Wen Sun
368
58
0
20 Jun 2023
Provably Robust Temporal Difference Learning for Heavy-Tailed Rewards
Provably Robust Temporal Difference Learning for Heavy-Tailed RewardsNeural Information Processing Systems (NeurIPS), 2023
Semih Cayci
A. Eryilmaz
285
6
0
20 Jun 2023
Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap
Warm-Start Actor-Critic: From Approximation Error to Sub-optimality GapInternational Conference on Machine Learning (ICML), 2023
Hang Wang
Sen Lin
Junshan Zhang
OffRLOnRL
271
4
0
20 Jun 2023
Acceleration in Policy Optimization
Acceleration in Policy Optimization
Veronica Chelu
Tom Zahavy
A. Guez
Doina Precup
Sebastian Flennerhag
356
0
0
18 Jun 2023
On the Global Convergence of Natural Actor-Critic with Two-layer Neural
  Network Parametrization
On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization
Mudit Gaur
Amrit Singh Bedi
Di-di Wang
Vaneet Aggarwal
296
8
0
18 Jun 2023
Low-Switching Policy Gradient with Exploration via Online Sensitivity
  Sampling
Low-Switching Policy Gradient with Exploration via Online Sensitivity SamplingInternational Conference on Machine Learning (ICML), 2023
Yunfan Li
Yiran Wang
Y. Cheng
Lin F. Yang
OffRL
268
6
0
15 Jun 2023
On the Linear Convergence of Policy Gradient under Hadamard
  Parameterization
On the Linear Convergence of Policy Gradient under Hadamard ParameterizationInformation and Inference A Journal of the IMA (JIII), 2023
Jiacai Liu
Jinchi Chen
Ke Wei
284
4
0
31 May 2023
Solving Robust MDPs through No-Regret Dynamics
Solving Robust MDPs through No-Regret Dynamics
E. Guha
363
0
0
30 May 2023
Decision-Aware Actor-Critic with Function Approximation and Theoretical
  Guarantees
Decision-Aware Actor-Critic with Function Approximation and Theoretical GuaranteesNeural Information Processing Systems (NeurIPS), 2023
Sharan Vaswani
A. Kazemi
Reza Babanezhad
Nicolas Le Roux
OffRL
469
6
0
24 May 2023
Zero-sum Polymatrix Markov Games: Equilibrium Collapse and Efficient
  Computation of Nash Equilibria
Zero-sum Polymatrix Markov Games: Equilibrium Collapse and Efficient Computation of Nash EquilibriaNeural Information Processing Systems (NeurIPS), 2023
Fivos Kalogiannis
Ioannis Panageas
429
8
0
23 May 2023
On First-Order Meta-Reinforcement Learning with Moreau Envelopes
On First-Order Meta-Reinforcement Learning with Moreau EnvelopesIEEE Conference on Decision and Control (CDC), 2023
Taha Toghani
Sebastian Perez-Salazar
César A. Uribe
296
2
0
20 May 2023
12345
Next
Page 1 of 5