v1v2v3v4v5 (latest)

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift

1 August 2019

Papers citing "On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift"

50 / 222 papers shown

Title
A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning Mizhaan Prajit Maniyar Akash Mondal Prashanth L.A. S. Bhatnagar 78 1 0 21 Apr 2023
Optimal Interpretability-Performance Trade-off of Classification Trees with Black-Box Reinforcement Learning Hector Kohler R. Akrour Philippe Preux OffRL 65 0 0 11 Apr 2023
Connected Superlevel Set in (Deep) Reinforcement Learning and its Application to Minimax Theorems Sihan Zeng Thinh T. Doan Justin Romberg OffRL 71 3 0 23 Mar 2023
Policy Mirror Descent Inherently Explores Action Space Yan Li Guanghui Lan OffRL 132 8 0 08 Mar 2023
Sampling-based Exploration for Reinforcement Learning of Dexterous Manipulation Gagan Khandate Siqi Shang Eric Chang Tristan L. Saidi Yang Liu Seth Matthew Dennis Johnson Adams M. Ciocarlie 108 32 0 06 Mar 2023
Revisiting LQR Control from the Perspective of Receding-Horizon Policy Gradient Xiangyuan Zhang Tamer Basar 84 20 0 25 Feb 2023
Best of Both Worlds Policy Optimization Christoph Dann Chen-Yu Wei Julian Zimmert 103 12 0 18 Feb 2023
Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation Qiwen Cui Jianchao Tan S. Du 129 24 0 07 Feb 2023
A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence Carlo Alfano Rui Yuan Patrick Rebeschini 158 15 0 30 Jan 2023
Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic Wesley A Suttle Amrit Singh Bedi Bhrij Patel Brian M Sadler Alec Koppel Dinesh Manocha 102 16 0 28 Jan 2023
Scalable and Sample Efficient Distributed Policy Gradient Algorithms in Multi-Agent Networked Systems Xin Liu Honghao Wei Lei Ying 125 6 0 13 Dec 2022
Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning Yizhou Zhang Guannan Qu Pan Xu Yiheng Lin Zaiwei Chen Adam Wierman 96 26 0 30 Nov 2022
On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization Mudit Gaur Vaneet Aggarwal Mridul Agarwal MLT 113 1 0 14 Nov 2022
Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence S. Pattathil Jianchao Tan Asuman Ozdaglar 99 14 0 23 Oct 2022
Finite-time analysis of single-timescale actor-critic Xu-yang Chen Lin Zhao OffRL 89 24 0 18 Oct 2022
On the convergence of policy gradient methods to Nash equilibria in general stochastic games Angeliki Giannou Kyriakos Lotidis P. Mertikopoulos Emmanouil-Vasileios Vlatakis-Gkaragkounis 127 18 0 17 Oct 2022
Decentralized Policy Gradient for Nash Equilibria Learning of General-sum Stochastic Games Yan Chen Taoying Li 65 2 0 14 Oct 2022
Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient Yuda Song Yi Zhou Ayush Sekhari J. Andrew Bagnell A. Krishnamurthy Wen Sun OffRL OnRL 115 105 0 13 Oct 2022
Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games Shicong Cen Yuejie Chi S. Du Lin Xiao 136 38 0 03 Oct 2022
Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation Xiaoteng Ma Zhipeng Liang Jose H. Blanchet MingWen Liu Li Xia Jiheng Zhang Qianchuan Zhao Zhengyuan Zhou OOD OffRL 103 26 0 14 Sep 2022
Efficiently Computing Nash Equilibria in Adversarial Team Markov Games Fivos Kalogiannis Ioannis Anagnostides Ioannis Panageas Emmanouil-Vasileios Vlatakis-Gkaragkounis Vaggos Chatziafratis S. Stavroulakis 73 13 0 03 Aug 2022
Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis Tian Xu Ziniu Li Yang Yu Zhimin Luo 67 8 0 03 Aug 2022
Boosted Off-Policy Learning Ben London Levi Lu Ted Sandler Thorsten Joachims OffRL 103 4 0 01 Aug 2022
Actor-Critic based Improper Reinforcement Learning Mohammadi Zaki Avinash Mohan Aditya Gopalan Shie Mannor 84 3 0 19 Jul 2022
Minimum Description Length Control Theodore H. Moskovitz Ta-Chu Kao M. Sahani M. Botvinick 80 1 0 17 Jul 2022
A Single-Timescale Analysis For Stochastic Approximation With Multiple Coupled Sequences Han Shen Tianyi Chen 127 15 0 21 Jun 2022
Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm Qinbo Bai Amrit Singh Bedi Vaneet Aggarwal 104 24 0 12 Jun 2022
Finite-Time Analysis of Fully Decentralized Single-Timescale Actor-Critic Qijun Luo Xiao Li 119 1 0 12 Jun 2022
Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning Ruida Zhou Tao-Wen Liu D. Kalathil P. R. Kumar Chao Tian 78 15 0 10 Jun 2022
Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games Sihan Zeng Thinh T. Doan Justin Romberg 154 22 0 27 May 2022
Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization Shicong Cen Fan Chen Yuejie Chi 100 15 0 12 Apr 2022
Accelerating Primal-dual Methods for Regularized Markov Decision Processes Haoya Li Hsiang-Fu Yu Lexing Ying Inderjit Dhillon 82 4 0 21 Feb 2022
A Globally Convergent Evolutionary Strategy for Stochastic Constrained Optimization with Applications to Reinforcement Learning Youssef Diouane Aurelien Lucchi Vihang Patil 93 3 0 21 Feb 2022
Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms Romain Laroche Rémi Tachet des Combes 97 2 0 15 Feb 2022
Uncovering Instabilities in Variational-Quantum Deep Q-Networks Maja Franz Lucas Wolf Maniraman Periyasamy Christian Ufrecht Daniel D. Scherer Axel Plinge Christopher Mutschler Wolfgang Mauerer 135 30 0 10 Feb 2022
Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory Ruiqi Zhang Xuezhou Zhang Chengzhuo Ni Mengdi Wang OffRL 92 16 0 10 Feb 2022
On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces Amrit Singh Bedi Souradip Chakraborty Anjaly Parayil Brian M Sadler Pratap Tokekar Alec Koppel 149 17 0 28 Jan 2022
Occupancy Information Ratio: Infinite-Horizon, Information-Directed, Parameterized Policy Search Wesley A Suttle Alec Koppel Ji Liu 73 0 0 21 Jan 2022
Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime B. Kerimkulov J. Leahy David Siska Lukasz Szpruch 101 14 0 18 Jan 2022
Block Policy Mirror Descent Guanghui Lan Yan Li T. Zhao OffRL 95 10 0 15 Jan 2022
A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning Zhaolin Ren Tianjun Zhang Csaba Szepesvári Bo Dai 117 20 0 22 Nov 2021
Towards an Understanding of Default Policies in Multitask Policy Optimization Theodore H. Moskovitz Michael Arbel Jack Parker-Holder Aldo Pacchiano 70 10 0 04 Nov 2021
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch Shangtong Zhang Rémi Tachet des Combes Romain Laroche 114 12 0 04 Nov 2021
Policy Optimization for Constrained MDPs with Provable Fast Global Convergence Tao-Wen Liu Ruida Zhou D. Kalathil P. R. Kumar Chao Tian 75 21 0 31 Oct 2021
Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings Matthew Shunshi Zhang Murat A. Erdogdu Animesh Garg 91 5 0 30 Oct 2021
Understanding the Effect of Stochasticity in Policy Optimization Jincheng Mei Bo Dai Chenjun Xiao Csaba Szepesvári Dale Schuurmans 93 19 0 29 Oct 2021
Neural PPO-Clip Attains Global Optimality: A Hinge Loss Perspective Nai-Chieh Huang Ping-Chun Hsieh Kuo-Hao Ho Hsuan-Yu Yao Kai-Chun Hu Liang-Chun Ouyang I-Chen Wu 103 1 0 26 Oct 2021
Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes Sihan Zeng Thinh T. Doan Justin Romberg 183 18 0 21 Oct 2021
Independent Natural Policy Gradient Always Converges in Markov Potential Games Roy Fox Stephen Marcus McAleer W. Overman Ioannis Panageas 90 49 0 20 Oct 2021
Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process Tianjiao Li Ziwei Guan Shaofeng Zou Tengyu Xu Yingbin Liang Guanghui Lan 80 30 0 20 Oct 2021