v1v2v3 (latest)

Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies

19 June 2019

Papers citing "Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies"

50 / 67 papers shown

Title
Reinforcement Learning with Random Time Horizons Enric Ribera Borrell Lorenz Richter Christof Schütte AI4TS 30 0 0 01 Jun 2025
Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates Jincheng Mei Bo Dai Alekh Agarwal Sharan Vaswani Anant Raj Csaba Szepesvári Dale Schuurmans 134 0 0 11 Feb 2025
Structure Matters: Dynamic Policy Gradient Sara Klein Xiangyuan Zhang Tamer Basar Simon Weissmann Leif Döring 59 0 0 07 Nov 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity Yan Yang Bin Gao Ya-xiang Yuan 133 2 0 30 May 2024
Almost sure convergence rates of stochastic gradient methods under gradient domination Simon Weissmann Sara Klein Waïss Azizian Leif Döring 86 3 0 22 May 2024
Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis Guangchen Lan Dong-Jun Han Abolfazl Hashemi Vaneet Aggarwal Christopher G. Brinton 228 16 0 09 Apr 2024
Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rate Yifan Lin Yuhao Wang Enlu Zhou 139 0 0 01 Mar 2024
A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces B. Kerimkulov J. Leahy David Siska Lukasz Szpruch Yufei Zhang 124 12 0 04 Oct 2023
Understanding the Complexity Gains of Single-Task RL with a Curriculum Qiyang Li Yuexiang Zhai Yi-An Ma Sergey Levine 112 16 0 24 Dec 2022
An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods Yanli Liu Kai Zhang Tamer Basar W. Yin 111 110 0 15 Nov 2022
Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence S. Pattathil Kai Zhang Asuman Ozdaglar 94 14 0 23 Oct 2022
On the convergence of policy gradient methods to Nash equilibria in general stochastic games Angeliki Giannou Kyriakos Lotidis P. Mertikopoulos Emmanouil-Vasileios Vlatakis-Gkaragkounis 124 18 0 17 Oct 2022
Decentralized Policy Gradient for Nash Equilibria Learning of General-sum Stochastic Games Yan Chen Taoying Li 51 2 0 14 Oct 2022
RTAW: An Attention Inspired Reinforcement Learning Method for Multi-Robot Task Allocation in Warehouse Environments Aakriti Agrawal Amrit Singh Bedi Tianyi Zhou 116 20 0 13 Sep 2022
Sampling Through the Lens of Sequential Decision Making J. Dou Alvin Pan Runxue Bao Haiyi Mao Lei Luo Zhi-Hong Mao 96 19 0 17 Aug 2022
A Single-Timescale Analysis For Stochastic Approximation With Multiple Coupled Sequences Han Shen Tianyi Chen 105 15 0 21 Jun 2022
How are policy gradient methods affected by the limits of control? Ingvar M. Ziemann Anastasios Tsiamis H. Sandberg Nikolai Matni 57 14 0 14 Jun 2022
Variance Reduction for Policy-Gradient Methods via Empirical Variance Minimization Maxim Kaledin Alexander Golubev Denis Belomestny OffRL 82 4 0 14 Jun 2022
Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm Qinbo Bai Amrit Singh Bedi Vaneet Aggarwal 78 24 0 12 Jun 2022
Finite-Time Analysis of Fully Decentralized Single-Timescale Actor-Critic Qijun Luo Xiao Li 102 1 0 12 Jun 2022
Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies Souradip Chakraborty Amrit Singh Bedi Alec Koppel Pratap Tokekar Tianyi Zhou 79 8 0 12 Jun 2022
Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs Dongsheng Ding Kai Zhang Jiali Duan Tamer Bacsar Mihailo R. Jovanović 73 21 0 06 Jun 2022
A Small Gain Analysis of Single Timescale Actor Critic Alexander Olshevsky Bahman Gharesifard 104 20 0 04 Mar 2022
A policy gradient approach for optimization of smooth risk measures Nithia Vijayan Prashanth L.A. OffRL 50 4 0 22 Feb 2022
Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms Romain Laroche Rémi Tachet des Combes 94 2 0 15 Feb 2022
Do Differentiable Simulators Give Better Policy Gradients? H.J. Terry Suh Max Simchowitz Kai Zhang Russ Tedrake 84 101 0 02 Feb 2022
Recent Advances in Reinforcement Learning in Finance B. Hambly Renyuan Xu Huining Yang OffRL 126 180 0 08 Dec 2021
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch Shangtong Zhang Rémi Tachet des Combes Romain Laroche 107 12 0 04 Nov 2021
Understanding the Effect of Stochasticity in Policy Optimization Jincheng Mei Bo Dai Chenjun Xiao Csaba Szepesvári Dale Schuurmans 70 19 0 29 Oct 2021
Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming Alec Koppel Amrit Singh Bedi Bhargav Ganguly Vaneet Aggarwal 51 4 0 22 Oct 2021
Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization Yuhao Ding Junzi Zhang Hyunin Lee Javad Lavaei 111 19 0 19 Oct 2021
Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees Siliang Zeng Tianyi Chen Alfredo García Mingyi Hong 92 11 0 11 Oct 2021
Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods Xin Guo Anran Hu Junzi Zhang OffRL 86 6 0 13 Sep 2021
Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network Approach Haotian Gu Xin Guo Xiaoli Wei Renyuan Xu OOD 99 36 0 05 Aug 2021
A general sample complexity analysis of vanilla policy gradient Rui Yuan Robert Mansel Gower A. Lazaric 118 64 0 23 Jul 2021
Policy Gradient Methods for Distortion Risk Measures Nithia Vijayan Prashanth L.A. 131 5 0 09 Jul 2021
On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control Amrit Singh Bedi Anjaly Parayil Junyu Zhang Mengdi Wang Alec Koppel 88 15 0 15 Jun 2021
Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation Anas Barakat Pascal Bianchi Julien Lehmann 91 9 0 14 Jun 2021
Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm Qinbo Bai Mridul Agarwal Vaneet Aggarwal 45 7 0 28 May 2021
A nearly Blackwell-optimal policy gradient method Vektor Dewanto M. Gallagher OffRL 35 0 0 28 May 2021
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality Tengyu Xu Zhuoran Yang Zhaoran Wang Yingbin Liang OffRL 104 25 0 23 Feb 2021
Softmax Policy Gradient Methods Can Take Exponential Time to Converge Gen Li Yuting Wei Yuejie Chi Yuxin Chen 109 53 0 22 Feb 2021
Provable Super-Convergence with a Large Cyclical Learning Rate Samet Oymak 62 12 0 22 Feb 2021
Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games Yulai Zhao Yuandong Tian Jason D. Lee S. Du OffRL 76 18 0 17 Feb 2021
On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method Junyu Zhang Chengzhuo Ni Zheng Yu Csaba Szepesvári Mengdi Wang 125 69 0 17 Feb 2021
Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity Kai Zhang Xiangyuan Zhang Bin Hu Tamer Bacsar 106 19 0 04 Jan 2021
Model Free Reinforcement Learning Algorithm for Stationary Mean field Equilibrium for Multiple Types of Agents A. Ghosh Vaneet Aggarwal 105 7 0 31 Dec 2020
Learning Fair Policies in Decentralized Cooperative Multi-Agent Reinforcement Learning Matthieu Zimmer Claire Glanois Umer Siddique Paul Weng OffRL 164 60 0 17 Dec 2020
Sample Complexity of Policy Gradient Finding Second-Order Stationary Points Long Yang Qian Zheng Gang Pan 100 21 0 02 Dec 2020
CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee Tengyu Xu Yingbin Liang Guanghui Lan 89 128 0 11 Nov 2020