Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.00261
Cited By
v1
v2
v3
v4
v5 (latest)
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
Annual Conference Computational Learning Theory (COLT), 2019
1 August 2019
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift"
50 / 225 papers shown
Towards Formalizing Reinforcement Learning Theory
Shangtong Zhang
156
3
0
05 Nov 2025
Offline Reinforcement Learning in Large State Spaces: Algorithms and Guarantees
Nan Jiang
Tengyang Xie
OffRL
246
16
0
05 Oct 2025
Sampling Complexity of TD and PPO in RKHS
Lu Zou
Wendi Ren
Weizhong Zhang
Liang Ding
Shuang Li
156
1
0
29 Sep 2025
Proximal Point Nash Learning from Human Feedback
D. Tiapkin
Daniele Calandriello
Denis Belomestny
Eric Moulines
Alexey Naumov
Kashif Rasul
Michal Valko
Pierre Ménard
274
4
0
26 May 2025
Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach
Swetha Ganesh
Vaneet Aggarwal
284
6
0
26 May 2025
KL-regularization Itself is Differentially Private in Bandits and RLHF
Yizhou Zhang
Kishan Panaganti
Laixi Shi
Juba Ziani
Adam Wierman
307
1
0
23 May 2025
Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Peter Chen
Xiaopeng Li
Zhiyu Li
Xi Chen
Tianyi Lin
619
0
0
16 May 2025
Infinite Horizon Markov Economies
Denizalp Goktas
Sadie Zhao
Yiling Chen
Amy Greenwald
299
1
0
22 Feb 2025
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
702
19
0
07 Nov 2024
Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control
Communications in Transportation Research (CTR), 2024
Zihao Sheng
Zilin Huang
Sikai Chen
305
22
0
30 Aug 2024
Functional Acceleration for Policy Mirror Descent
Veronica Chelu
Doina Precup
386
2
0
23 Jul 2024
SAIL: Self-Improving Efficient Online Alignment of Large Language Models
Mucong Ding
Souradip Chakraborty
Vibhu Agrawal
Zora Che
Alec Koppel
Mengdi Wang
Amrit Singh Bedi
Furong Huang
307
21
0
21 Jun 2024
Hybrid Reinforcement Learning from Offline Observation Alone
Yuda Song
J. Andrew Bagnell
Aarti Singh
OffRL
350
6
0
11 Jun 2024
Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation
Shangding Gu
Laixi Shi
Yuhao Ding
Alois Knoll
C. Spanos
Adam Wierman
Ming Jin
OffRL
342
9
0
31 May 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
502
10
0
30 May 2024
Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments
Han Wang
Sihong He
Zhili Zhang
Fei Miao
James Anderson
296
9
0
29 May 2024
Recurrent Natural Policy Gradient for POMDPs
Semih Cayci
A. Eryilmaz
389
3
0
28 May 2024
Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement Learning
Sihan Zeng
Thinh T. Doan
Justin Romberg
238
0
0
03 May 2024
Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles
International Conference on Machine Learning (ICML), 2024
Bhrij Patel
Wesley A Suttle
Alec Koppel
Vaneet Aggarwal
Brian M Sadler
Amrit Singh Bedi
Dinesh Manocha
539
4
0
18 Mar 2024
Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems
Wesley A Suttle
Vipul K Sharma
K. Kosaraju
S. Sivaranjani
Ji Liu
Vijay Gupta
Brian M Sadler
231
3
0
06 Mar 2024
Learning to Model Diverse Driving Behaviors in Highly Interactive Autonomous Driving Scenarios with Multi-Agent Reinforcement Learning
Weiwei Liu
Wenxuan Hu
Wei Jing
Lanxin Lei
Lingping Gao
Yong Liu
288
9
0
21 Feb 2024
Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
International Conference on Machine Learning (ICML), 2024
Han Shen
Zhuoran Yang
Tianyi Chen
OffRL
434
34
0
10 Feb 2024
Behind the Myth of Exploration in Policy Gradients
Adrien Bolland
Gaspard Lambrechts
Damien Ernst
474
3
0
31 Jan 2024
R
×
\times
×
R: Rapid eXploration for Reinforcement Learning via Sampling-based Reset Distributions and Imitation Pre-training
Gagan Khandate
Tristan L. Saidi
Siqi Shang
Eric T. Chang
Yang Liu
Seth Matthew Dennis
Johnson Adams
M. Ciocarlie
385
5
0
27 Jan 2024
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations
Matthias Lehmann
366
9
0
24 Jan 2024
On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization
Ling Liang
Haizhao Yang
248
1
0
23 Jan 2024
PPO-Clip Attains Global Optimality: Towards Deeper Understandings of Clipping
Nai-Chieh Huang
Ping-Chun Hsieh
Kuo-Hao Ho
I-Chen Wu
318
33
0
19 Dec 2023
Fast Policy Learning for Linear Quadratic Control with Entropy Regularization
Xin Guo
Xinyu Li
Renyuan Xu
499
9
0
23 Nov 2023
On the Second-Order Convergence of Biased Policy Gradient Algorithms
International Conference on Machine Learning (ICML), 2023
Siqiao Mu
Diego Klabjan
486
4
0
05 Nov 2023
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms
Neural Information Processing Systems (NeurIPS), 2023
Shenao Zhang
Boyi Liu
Zhaoran Wang
Tuo Zhao
363
6
0
30 Oct 2023
Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Washim Uddin Mondal
Vaneet Aggarwal
334
23
0
18 Oct 2023
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
International Conference on Machine Learning (ICML), 2023
Ziniu Li
Tian Xu
Yushun Zhang
Zhihang Lin
Yang Yu
Tian Ding
Zhimin Luo
563
162
0
16 Oct 2023
Bi-Level Offline Policy Optimization with Limited Exploration
Neural Information Processing Systems (NeurIPS), 2023
Wenzhuo Zhou
OffRL
319
5
0
10 Oct 2023
A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces
B. Kerimkulov
J. Leahy
David Siska
Lukasz Szpruch
Yufei Zhang
459
17
0
04 Oct 2023
On Representation Complexity of Model-based and Model-free Reinforcement Learning
International Conference on Learning Representations (ICLR), 2023
Hanlin Zhu
Baihe Huang
Stuart Russell
OffRL
465
5
0
03 Oct 2023
Stackelberg Batch Policy Learning
Wenzhuo Zhou
Annie Qu
OffRL
340
1
0
28 Sep 2023
Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs
Hector Kohler
R. Akrour
Philippe Preux
OffRL
544
1
0
23 Sep 2023
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes
AAAI Conference on Artificial Intelligence (AAAI), 2023
Qinbo Bai
Washim Uddin Mondal
Vaneet Aggarwal
405
23
0
05 Sep 2023
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback
International Conference on Learning Representations (ICLR), 2023
Souradip Chakraborty
Amrit Singh Bedi
Alec Koppel
Dinesh Manocha
Huazheng Wang
Mengdi Wang
Furong Huang
422
41
0
03 Aug 2023
Learning to Generate Better Than Your LLM
Jonathan D. Chang
Kianté Brantley
Rajkumar Ramamurthy
Dipendra Kumar Misra
Wen Sun
378
58
0
20 Jun 2023
Provably Robust Temporal Difference Learning for Heavy-Tailed Rewards
Neural Information Processing Systems (NeurIPS), 2023
Semih Cayci
A. Eryilmaz
289
7
0
20 Jun 2023
Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap
International Conference on Machine Learning (ICML), 2023
Hang Wang
Sen Lin
Junshan Zhang
OffRL
OnRL
278
4
0
20 Jun 2023
Acceleration in Policy Optimization
Veronica Chelu
Tom Zahavy
A. Guez
Doina Precup
Sebastian Flennerhag
357
0
0
18 Jun 2023
On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization
Mudit Gaur
Amrit Singh Bedi
Di-di Wang
Vaneet Aggarwal
301
8
0
18 Jun 2023
Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling
International Conference on Machine Learning (ICML), 2023
Yunfan Li
Yiran Wang
Y. Cheng
Lin F. Yang
OffRL
271
6
0
15 Jun 2023
On the Linear Convergence of Policy Gradient under Hadamard Parameterization
Information and Inference A Journal of the IMA (JIII), 2023
Jiacai Liu
Jinchi Chen
Ke Wei
285
4
0
31 May 2023
Solving Robust MDPs through No-Regret Dynamics
E. Guha
363
0
0
30 May 2023
Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees
Neural Information Processing Systems (NeurIPS), 2023
Sharan Vaswani
A. Kazemi
Reza Babanezhad
Nicolas Le Roux
OffRL
478
6
0
24 May 2023
Zero-sum Polymatrix Markov Games: Equilibrium Collapse and Efficient Computation of Nash Equilibria
Neural Information Processing Systems (NeurIPS), 2023
Fivos Kalogiannis
Ioannis Panageas
432
8
0
23 May 2023
On First-Order Meta-Reinforcement Learning with Moreau Envelopes
IEEE Conference on Decision and Control (CDC), 2023
Taha Toghani
Sebastian Perez-Salazar
César A. Uribe
296
2
0
20 May 2023
1
2
3
4
5
Next
Page 1 of 5