ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.00261
  4. Cited By
On the Theory of Policy Gradient Methods: Optimality, Approximation, and
  Distribution Shift
v1v2v3v4v5 (latest)

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift

1 August 2019
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
ArXiv (abs)PDFHTML

Papers citing "On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift"

50 / 222 papers shown
Title
Accelerating Nash Learning from Human Feedback via Mirror Prox
Accelerating Nash Learning from Human Feedback via Mirror Prox
D. Tiapkin
Daniele Calandriello
Denis Belomestny
Eric Moulines
Alexey Naumov
Kashif Rasul
Michal Valko
Pierre Ménard
64
0
0
26 May 2025
Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach
Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach
Swetha Ganesh
Vaneet Aggarwal
48
0
0
26 May 2025
KL-regularization Itself is Differentially Private in Bandits and RLHF
KL-regularization Itself is Differentially Private in Bandits and RLHF
Yizhou Zhang
Kishan Panaganti
Laixi Shi
Juba Ziani
Adam Wierman
52
0
0
23 May 2025
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Peter Chen
Xiaopeng Li
Zhiyu Li
Xi Chen
Tianyi Lin
109
0
0
16 May 2025
Infinite Horizon Markov Economies
Denizalp Goktas
Sadie Zhao
Yiling Chen
Amy Greenwald
69
1
0
22 Feb 2025
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
234
6
0
07 Nov 2024
Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control
Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control
Zihao Sheng
Zilin Huang
Sikai Chen
101
10
0
30 Aug 2024
Functional Acceleration for Policy Mirror Descent
Functional Acceleration for Policy Mirror Descent
Veronica Chelu
Doina Precup
121
0
0
23 Jul 2024
SAIL: Self-Improving Efficient Online Alignment of Large Language Models
SAIL: Self-Improving Efficient Online Alignment of Large Language Models
Mucong Ding
Souradip Chakraborty
Vibhu Agrawal
Zora Che
Alec Koppel
Mengdi Wang
Amrit Singh Bedi
Furong Huang
85
13
0
21 Jun 2024
Hybrid Reinforcement Learning from Offline Observation Alone
Hybrid Reinforcement Learning from Offline Observation Alone
Yuda Song
J. Andrew Bagnell
Aarti Singh
OffRL
130
2
0
11 Jun 2024
Enhancing Efficiency of Safe Reinforcement Learning via Sample
  Manipulation
Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation
Shangding Gu
Laixi Shi
Yuhao Ding
Alois Knoll
C. Spanos
Adam Wierman
Ming Jin
OffRL
94
2
0
31 May 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
147
2
0
30 May 2024
Momentum for the Win: Collaborative Federated Reinforcement Learning
  across Heterogeneous Environments
Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments
Han Wang
Sihong He
Zhili Zhang
Fei Miao
James Anderson
99
4
0
29 May 2024
Recurrent Natural Policy Gradient for POMDPs
Recurrent Natural Policy Gradient for POMDPs
Semih Cayci
A. Eryilmaz
93
1
0
28 May 2024
Natural Policy Gradient and Actor Critic Methods for Constrained
  Multi-Task Reinforcement Learning
Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement Learning
Sihan Zeng
Thinh T. Doan
Justin Romberg
88
0
0
03 May 2024
Towards Global Optimality for Practical Average Reward Reinforcement
  Learning without Mixing Time Oracles
Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles
Bhrij Patel
Wesley A Suttle
Alec Koppel
Vaneet Aggarwal
Brian M Sadler
Amrit Singh Bedi
Dinesh Manocha
90
1
0
18 Mar 2024
Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical
  Systems
Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems
Wesley A Suttle
Vipul K Sharma
K. Kosaraju
S. Sivaranjani
Ji Liu
Vijay Gupta
Brian M Sadler
72
1
0
06 Mar 2024
Learning to Model Diverse Driving Behaviors in Highly Interactive
  Autonomous Driving Scenarios with Multi-Agent Reinforcement Learning
Learning to Model Diverse Driving Behaviors in Highly Interactive Autonomous Driving Scenarios with Multi-Agent Reinforcement Learning
Weiwei Liu
Wenxuan Hu
Wei Jing
Lanxin Lei
Lingping Gao
Yong Liu
95
2
0
21 Feb 2024
Principled Penalty-based Methods for Bilevel Reinforcement Learning and
  RLHF
Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
Han Shen
Zhuoran Yang
Tianyi Chen
OffRL
114
15
0
10 Feb 2024
Behind the Myth of Exploration in Policy Gradients
Behind the Myth of Exploration in Policy Gradients
Adrien Bolland
Gaspard Lambrechts
Damien Ernst
144
0
0
31 Jan 2024
R$\times$R: Rapid eXploration for Reinforcement Learning via
  Sampling-based Reset Distributions and Imitation Pre-training
R×\times×R: Rapid eXploration for Reinforcement Learning via Sampling-based Reset Distributions and Imitation Pre-training
Gagan Khandate
Tristan L. Saidi
Siqi Shang
Eric T. Chang
Yang Liu
Seth Matthew Dennis
Johnson Adams
M. Ciocarlie
135
4
0
27 Jan 2024
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning:
  Theory, Algorithms and Implementations
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations
Matthias Lehmann
84
0
0
24 Jan 2024
On the Stochastic (Variance-Reduced) Proximal Gradient Method for
  Regularized Expected Reward Optimization
On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization
Ling Liang
Haizhao Yang
75
1
0
23 Jan 2024
PPO-Clip Attains Global Optimality: Towards Deeper Understandings of
  Clipping
PPO-Clip Attains Global Optimality: Towards Deeper Understandings of Clipping
Nai-Chieh Huang
Ping-Chun Hsieh
Kuo-Hao Ho
I-Chen Wu
77
9
0
19 Dec 2023
Fast Policy Learning for Linear Quadratic Control with Entropy
  Regularization
Fast Policy Learning for Linear Quadratic Control with Entropy Regularization
Xin Guo
Xinyu Li
Renyuan Xu
124
3
0
23 Nov 2023
On the Second-Order Convergence of Biased Policy Gradient Algorithms
On the Second-Order Convergence of Biased Policy Gradient Algorithms
Siqiao Mu
Diego Klabjan
96
2
0
05 Nov 2023
Model-Based Reparameterization Policy Gradient Methods: Theory and
  Practical Algorithms
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms
Shenao Zhang
Boyi Liu
Zhaoran Wang
Tuo Zhao
73
2
0
30 Oct 2023
Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm
  with General Parameterization for Infinite Horizon Discounted Reward Markov
  Decision Processes
Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes
Washim Uddin Mondal
Vaneet Aggarwal
84
11
0
18 Oct 2023
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method
  for Aligning Large Language Models
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li
Tian Xu
Yushun Zhang
Zhihang Lin
Yang Yu
Ruoyu Sun
Zhimin Luo
148
79
0
16 Oct 2023
Bi-Level Offline Policy Optimization with Limited Exploration
Bi-Level Offline Policy Optimization with Limited Exploration
Wenzhuo Zhou
OffRL
103
5
0
10 Oct 2023
A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces
A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces
B. Kerimkulov
J. Leahy
David Siska
Lukasz Szpruch
Yufei Zhang
138
12
0
04 Oct 2023
On Representation Complexity of Model-based and Model-free Reinforcement
  Learning
On Representation Complexity of Model-based and Model-free Reinforcement Learning
Hanlin Zhu
Baihe Huang
Stuart Russell
OffRL
89
4
0
03 Oct 2023
Stackelberg Batch Policy Learning
Stackelberg Batch Policy Learning
Wenzhuo Zhou
Annie Qu
OffRL
84
1
0
28 Sep 2023
Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in
  IBMDPs
Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs
Hector Kohler
R. Akrour
Philippe Preux
OffRL
94
1
0
23 Sep 2023
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon
  Average Reward Markov Decision Processes
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes
Qinbo Bai
Washim Uddin Mondal
Vaneet Aggarwal
61
13
0
05 Sep 2023
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning
  from Human Feedback
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback
Souradip Chakraborty
Amrit Singh Bedi
Alec Koppel
Dinesh Manocha
Huazheng Wang
Mengdi Wang
Furong Huang
122
27
0
03 Aug 2023
Learning to Generate Better Than Your LLM
Learning to Generate Better Than Your LLM
Jonathan D. Chang
Kianté Brantley
Rajkumar Ramamurthy
Dipendra Kumar Misra
Wen Sun
82
49
0
20 Jun 2023
Provably Robust Temporal Difference Learning for Heavy-Tailed Rewards
Provably Robust Temporal Difference Learning for Heavy-Tailed Rewards
Semih Cayci
A. Eryilmaz
81
2
0
20 Jun 2023
Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap
Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap
Hang Wang
Sen Lin
Junshan Zhang
OffRLOnRL
93
3
0
20 Jun 2023
Acceleration in Policy Optimization
Acceleration in Policy Optimization
Veronica Chelu
Tom Zahavy
A. Guez
Doina Precup
Sebastian Flennerhag
104
0
0
18 Jun 2023
On the Global Convergence of Natural Actor-Critic with Two-layer Neural
  Network Parametrization
On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization
Mudit Gaur
Amrit Singh Bedi
Di-di Wang
Vaneet Aggarwal
94
3
0
18 Jun 2023
Low-Switching Policy Gradient with Exploration via Online Sensitivity
  Sampling
Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling
Yunfan Li
Yiran Wang
Y. Cheng
Lin F. Yang
OffRL
104
4
0
15 Jun 2023
On the Linear Convergence of Policy Gradient under Hadamard
  Parameterization
On the Linear Convergence of Policy Gradient under Hadamard Parameterization
Jiacai Liu
Jinchi Chen
Ke Wei
78
3
0
31 May 2023
Solving Robust MDPs through No-Regret Dynamics
Solving Robust MDPs through No-Regret Dynamics
E. Guha
80
0
0
30 May 2023
Decision-Aware Actor-Critic with Function Approximation and Theoretical
  Guarantees
Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees
Sharan Vaswani
A. Kazemi
Reza Babanezhad
Nicolas Le Roux
OffRL
97
4
0
24 May 2023
Zero-sum Polymatrix Markov Games: Equilibrium Collapse and Efficient
  Computation of Nash Equilibria
Zero-sum Polymatrix Markov Games: Equilibrium Collapse and Efficient Computation of Nash Equilibria
Fivos Kalogiannis
Ioannis Panageas
97
8
0
23 May 2023
On First-Order Meta-Reinforcement Learning with Moreau Envelopes
On First-Order Meta-Reinforcement Learning with Moreau Envelopes
Taha Toghani
Sebastian Perez-Salazar
César A. Uribe
114
2
0
20 May 2023
Delay-Adapted Policy Optimization and Improved Regret for Adversarial
  MDP with Delayed Bandit Feedback
Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback
Tal Lancewicki
Aviv A. Rosenberg
Dmitry Sotnikov
55
3
0
13 May 2023
Policy Gradient Algorithms Implicitly Optimize by Continuation
Policy Gradient Algorithms Implicitly Optimize by Continuation
Adrien Bolland
Gilles Louppe
D. Ernst
58
3
0
11 May 2023
Local Optimization Achieves Global Optimality in Multi-Agent
  Reinforcement Learning
Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning
Yulai Zhao
Zhuoran Yang
Zhaoran Wang
Jason D. Lee
79
3
0
08 May 2023
12345
Next