v1v2v3v4v5 (latest)

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift

1 August 2019

Papers citing "On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift"

50 / 222 papers shown

Title
Provable RL with Exogenous Distractors via Multistep Inverse Dynamics Yonathan Efroni Dipendra Kumar Misra A. Krishnamurthy Alekh Agarwal John Langford OffRL 96 23 0 17 Oct 2021
Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees Siliang Zeng Tianyi Chen Alfredo García Mingyi Hong 100 11 0 11 Oct 2021
Satisficing Paths and Independent Multi-Agent Reinforcement Learning in Stochastic Games Bora Yongacoglu Gürdal Arslan S. Yüksel 69 16 0 09 Oct 2021
Approximate Newton policy gradient algorithms Haoya Li Samarth Gupta Hsiangfu Yu Lexing Ying Inderjit Dhillon 93 3 0 05 Oct 2021
Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates Romain Laroche Rémi Tachet des Combes 102 8 0 29 Sep 2021
Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning Carlo Alfano Patrick Rebeschini 93 5 0 23 Sep 2021
Reinforcement Learning for Load-balanced Parallel Particle Tracing Jiayi Xu Hanqi Guo Han-Wei Shen Mukund Raj Skylar W. Wurster Tom Peterka 32 6 0 13 Sep 2021
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning Andrea Zanette Martin J. Wainwright Emma Brunskill OffRL 104 119 0 19 Aug 2021
Global Convergence of the ODE Limit for Online Actor-Critic Algorithms in Reinforcement Learning Ziheng Wang Justin A. Sirignano 82 2 0 19 Aug 2021
A general class of surrogate functions for stable and efficient reinforcement learning Sharan Vaswani Olivier Bachem Simone Totaro Robert Mueller Shivam Garg Matthieu Geist Marlos C. Machado Pablo Samuel Castro Nicolas Le Roux OffRL 124 16 0 12 Aug 2021
Variational Actor-Critic Algorithms Yuhua Zhu Lexing Ying OffRL 72 0 0 03 Aug 2021
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses Haipeng Luo Chen-Yu Wei Chung-Wei Lee 124 45 0 18 Jul 2021
Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage Masatoshi Uehara Wen Sun OffRL 202 150 0 13 Jul 2021
Cautious Policy Programming: Exploiting KL Regularization in Monotonic Policy Improvement for Reinforcement Learning Lingwei Zhu Toshinori Kitamura Takamitsu Matsubara OffRL 59 1 0 13 Jul 2021
Curious Explorer: a provable exploration strategy in Policy Learning M. Miani Maurizio Parton M. Romito 131 0 0 29 Jun 2021
Tighter Analysis of Alternating Stochastic Gradient Method for Stochastic Nested Problems Tianyi Chen Yuejiao Sun W. Yin 84 33 0 25 Jun 2021
Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation Yunhao Tang Tadashi Kozuno Mark Rowland Rémi Munos Michal Valko OffRL 136 9 0 24 Jun 2021
On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control Amrit Singh Bedi Anjaly Parayil Junyu Zhang Mengdi Wang Alec Koppel 90 15 0 15 Jun 2021
Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity Dhruv Malik Aldo Pacchiano Vishwak Srinivasan Yuanzhi Li 57 6 0 15 Jun 2021
On-Policy Deep Reinforcement Learning for the Average-Reward Criterion Yiming Zhang George Andriopoulos OffRL 97 43 0 14 Jun 2021
Markov Decision Processes with Long-Term Average Constraints Mridul Agarwal Qinbo Bai Vaneet Aggarwal 56 6 0 12 Jun 2021
Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation Semih Cayci Niao He R. Srikant 113 36 0 08 Jun 2021
Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games Stefanos Leonardos W. Overman Ioannis Panageas Georgios Piliouras 109 123 0 03 Jun 2021
On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction Jiawei Huang Nan Jiang 100 5 0 02 Jun 2021
Reward is enough for convex MDPs Tom Zahavy Brendan O'Donoghue Guillaume Desjardins Satinder Singh 141 76 0 01 Jun 2021
Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm Qinbo Bai Mridul Agarwal Vaneet Aggarwal 58 7 0 28 May 2021
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence Wenhao Zhan Shicong Cen Baihe Huang Yuxin Chen Jason D. Lee Yuejie Chi 100 78 0 24 May 2021
Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting Gen Li Yuxin Chen Yuejie Chi Yuantao Gu Yuting Wei OffRL 92 30 0 17 May 2021
Leveraging Non-uniformity in First-order Non-convex Optimization Jincheng Mei Yue Gao Bo Dai Csaba Szepesvári Dale Schuurmans 92 50 0 13 May 2021
On the Linear convergence of Natural Policy Gradient Algorithm S. Khodadadian P. Jhunjhunwala Sushil Mahavir Varma S. T. Maguluri 95 57 0 04 May 2021
Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation Andrea Zanette Ching-An Cheng Alekh Agarwal 122 53 0 24 Mar 2021
Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism Paria Rashidinejad Banghua Zhu Cong Ma Jiantao Jiao Stuart J. Russell OffRL 303 291 0 22 Mar 2021
Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme Konstantin Avrachenkov Vivek Borkar H. Dolhare K. Patil 60 9 0 10 Mar 2021
State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning with Rewards Miguel Calvo-Fullana Santiago Paternain Luiz F. O. Chamon Alejandro Ribeiro OffRL 86 34 0 23 Feb 2021
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality Tengyu Xu Zhuoran Yang Zhaoran Wang Yingbin Liang OffRL 127 25 0 23 Feb 2021
Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement Learning via Frank-Wolfe Policy Optimization Jyun-Li Lin Wei-Ting Hung Shangtong Yang Ping-Chun Hsieh Xi Liu 122 14 0 22 Feb 2021
Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games Yulai Zhao Yuandong Tian Jason D. Lee S. Du OffRL 76 18 0 17 Feb 2021
On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method Junyu Zhang Chengzhuo Ni Zheng Yu Csaba Szepesvári Mengdi Wang 133 69 0 17 Feb 2021
Improper Reinforcement Learning with Gradient-based Policy Optimization Mohammadi Zaki Avinash Mohan Aditya Gopalan Shie Mannor 44 0 0 16 Feb 2021
Online Apprenticeship Learning Lior Shani Tom Zahavy Shie Mannor OffRL 100 27 0 13 Feb 2021
Optimization Issues in KL-Constrained Approximate Policy Iteration N. Lazić Botao Hao Yasin Abbasi-Yadkori Dale Schuurmans Csaba Szepesvári 62 11 0 11 Feb 2021
Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature Kefan Dong Jiaqi Yang Tengyu Ma 102 33 0 08 Feb 2021
Multi-Agent Reinforcement Learning with Temporal Logic Specifications Lewis Hammond Alessandro Abate Julian Gutierrez Michael Wooldridge AI4CE 108 33 0 01 Feb 2021
Reinforcement Learning for Selective Key Applications in Power Systems: Recent Advances and Future Challenges Xin Chen Guannan Qu Yujie Tang S. Low Na Li 88 241 0 27 Jan 2021
Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm S. Khodadadian Thinh T. Doan Justin Romberg S. T. Maguluri 104 43 0 26 Jan 2021
Independent Policy Gradient Methods for Competitive Reinforcement Learning C. Daskalakis Dylan J. Foster Noah Golowich 246 163 0 11 Jan 2021
Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint Nithia Vijayan A. PrashanthL. OffRL 77 7 0 06 Jan 2021
Robust Asymmetric Learning in POMDPs Andrew Warrington J. Lavington Adam Scibior Mark Schmidt Frank Wood 78 32 0 31 Dec 2020
Towards Understanding Asynchronous Advantage Actor-critic: Convergence and Linear Speedup Han Shen Jianchao Tan Min-Fong Hong Tianyi Chen 90 30 0 31 Dec 2020
Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy Han Zhong Xun Deng Ethan X. Fang Zhuoran Yang Zhaoran Wang Runze Li 71 3 0 28 Dec 2020