Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.00261
Cited By
v1
v2
v3
v4
v5 (latest)
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
1 August 2019
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift"
50 / 222 papers shown
Title
Accelerating Nash Learning from Human Feedback via Mirror Prox
D. Tiapkin
Daniele Calandriello
Denis Belomestny
Eric Moulines
Alexey Naumov
Kashif Rasul
Michal Valko
Pierre Ménard
64
0
0
26 May 2025
Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach
Swetha Ganesh
Vaneet Aggarwal
48
0
0
26 May 2025
KL-regularization Itself is Differentially Private in Bandits and RLHF
Yizhou Zhang
Kishan Panaganti
Laixi Shi
Juba Ziani
Adam Wierman
52
0
0
23 May 2025
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Peter Chen
Xiaopeng Li
Zhiyu Li
Xi Chen
Tianyi Lin
109
0
0
16 May 2025
Infinite Horizon Markov Economies
Denizalp Goktas
Sadie Zhao
Yiling Chen
Amy Greenwald
69
1
0
22 Feb 2025
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
234
6
0
07 Nov 2024
Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control
Zihao Sheng
Zilin Huang
Sikai Chen
101
10
0
30 Aug 2024
Functional Acceleration for Policy Mirror Descent
Veronica Chelu
Doina Precup
121
0
0
23 Jul 2024
SAIL: Self-Improving Efficient Online Alignment of Large Language Models
Mucong Ding
Souradip Chakraborty
Vibhu Agrawal
Zora Che
Alec Koppel
Mengdi Wang
Amrit Singh Bedi
Furong Huang
85
13
0
21 Jun 2024
Hybrid Reinforcement Learning from Offline Observation Alone
Yuda Song
J. Andrew Bagnell
Aarti Singh
OffRL
130
2
0
11 Jun 2024
Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation
Shangding Gu
Laixi Shi
Yuhao Ding
Alois Knoll
C. Spanos
Adam Wierman
Ming Jin
OffRL
94
2
0
31 May 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
147
2
0
30 May 2024
Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments
Han Wang
Sihong He
Zhili Zhang
Fei Miao
James Anderson
99
4
0
29 May 2024
Recurrent Natural Policy Gradient for POMDPs
Semih Cayci
A. Eryilmaz
93
1
0
28 May 2024
Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement Learning
Sihan Zeng
Thinh T. Doan
Justin Romberg
88
0
0
03 May 2024
Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles
Bhrij Patel
Wesley A Suttle
Alec Koppel
Vaneet Aggarwal
Brian M Sadler
Amrit Singh Bedi
Dinesh Manocha
90
1
0
18 Mar 2024
Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems
Wesley A Suttle
Vipul K Sharma
K. Kosaraju
S. Sivaranjani
Ji Liu
Vijay Gupta
Brian M Sadler
72
1
0
06 Mar 2024
Learning to Model Diverse Driving Behaviors in Highly Interactive Autonomous Driving Scenarios with Multi-Agent Reinforcement Learning
Weiwei Liu
Wenxuan Hu
Wei Jing
Lanxin Lei
Lingping Gao
Yong Liu
95
2
0
21 Feb 2024
Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
Han Shen
Zhuoran Yang
Tianyi Chen
OffRL
114
15
0
10 Feb 2024
Behind the Myth of Exploration in Policy Gradients
Adrien Bolland
Gaspard Lambrechts
Damien Ernst
144
0
0
31 Jan 2024
R
×
\times
×
R: Rapid eXploration for Reinforcement Learning via Sampling-based Reset Distributions and Imitation Pre-training
Gagan Khandate
Tristan L. Saidi
Siqi Shang
Eric T. Chang
Yang Liu
Seth Matthew Dennis
Johnson Adams
M. Ciocarlie
135
4
0
27 Jan 2024
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations
Matthias Lehmann
84
0
0
24 Jan 2024
On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization
Ling Liang
Haizhao Yang
75
1
0
23 Jan 2024
PPO-Clip Attains Global Optimality: Towards Deeper Understandings of Clipping
Nai-Chieh Huang
Ping-Chun Hsieh
Kuo-Hao Ho
I-Chen Wu
77
9
0
19 Dec 2023
Fast Policy Learning for Linear Quadratic Control with Entropy Regularization
Xin Guo
Xinyu Li
Renyuan Xu
124
3
0
23 Nov 2023
On the Second-Order Convergence of Biased Policy Gradient Algorithms
Siqiao Mu
Diego Klabjan
96
2
0
05 Nov 2023
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms
Shenao Zhang
Boyi Liu
Zhaoran Wang
Tuo Zhao
73
2
0
30 Oct 2023
Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes
Washim Uddin Mondal
Vaneet Aggarwal
84
11
0
18 Oct 2023
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li
Tian Xu
Yushun Zhang
Zhihang Lin
Yang Yu
Ruoyu Sun
Zhimin Luo
148
79
0
16 Oct 2023
Bi-Level Offline Policy Optimization with Limited Exploration
Wenzhuo Zhou
OffRL
103
5
0
10 Oct 2023
A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces
B. Kerimkulov
J. Leahy
David Siska
Lukasz Szpruch
Yufei Zhang
138
12
0
04 Oct 2023
On Representation Complexity of Model-based and Model-free Reinforcement Learning
Hanlin Zhu
Baihe Huang
Stuart Russell
OffRL
89
4
0
03 Oct 2023
Stackelberg Batch Policy Learning
Wenzhuo Zhou
Annie Qu
OffRL
84
1
0
28 Sep 2023
Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs
Hector Kohler
R. Akrour
Philippe Preux
OffRL
94
1
0
23 Sep 2023
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes
Qinbo Bai
Washim Uddin Mondal
Vaneet Aggarwal
61
13
0
05 Sep 2023
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback
Souradip Chakraborty
Amrit Singh Bedi
Alec Koppel
Dinesh Manocha
Huazheng Wang
Mengdi Wang
Furong Huang
122
27
0
03 Aug 2023
Learning to Generate Better Than Your LLM
Jonathan D. Chang
Kianté Brantley
Rajkumar Ramamurthy
Dipendra Kumar Misra
Wen Sun
82
49
0
20 Jun 2023
Provably Robust Temporal Difference Learning for Heavy-Tailed Rewards
Semih Cayci
A. Eryilmaz
81
2
0
20 Jun 2023
Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap
Hang Wang
Sen Lin
Junshan Zhang
OffRL
OnRL
93
3
0
20 Jun 2023
Acceleration in Policy Optimization
Veronica Chelu
Tom Zahavy
A. Guez
Doina Precup
Sebastian Flennerhag
104
0
0
18 Jun 2023
On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization
Mudit Gaur
Amrit Singh Bedi
Di-di Wang
Vaneet Aggarwal
94
3
0
18 Jun 2023
Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling
Yunfan Li
Yiran Wang
Y. Cheng
Lin F. Yang
OffRL
104
4
0
15 Jun 2023
On the Linear Convergence of Policy Gradient under Hadamard Parameterization
Jiacai Liu
Jinchi Chen
Ke Wei
78
3
0
31 May 2023
Solving Robust MDPs through No-Regret Dynamics
E. Guha
80
0
0
30 May 2023
Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees
Sharan Vaswani
A. Kazemi
Reza Babanezhad
Nicolas Le Roux
OffRL
97
4
0
24 May 2023
Zero-sum Polymatrix Markov Games: Equilibrium Collapse and Efficient Computation of Nash Equilibria
Fivos Kalogiannis
Ioannis Panageas
97
8
0
23 May 2023
On First-Order Meta-Reinforcement Learning with Moreau Envelopes
Taha Toghani
Sebastian Perez-Salazar
César A. Uribe
114
2
0
20 May 2023
Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback
Tal Lancewicki
Aviv A. Rosenberg
Dmitry Sotnikov
55
3
0
13 May 2023
Policy Gradient Algorithms Implicitly Optimize by Continuation
Adrien Bolland
Gilles Louppe
D. Ernst
58
3
0
11 May 2023
Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning
Yulai Zhao
Zhuoran Yang
Zhaoran Wang
Jason D. Lee
79
3
0
08 May 2023
1
2
3
4
5
Next