Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

4 November 2021

Papers citing "Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch"

12 / 12 papers shown

Title
Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features Zixuan Xie Xinyu Liu Rohan Chandra Shangtong Zhang 5 0 0 27 May 2025
A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games Zaiwei Chen Kai Zhang Eric Mazumdar Asuman Ozdaglar Adam Wierman 79 6 0 03 Mar 2023
Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning Yizhou Zhang Guannan Qu Pan Xu Yiheng Lin Zaiwei Chen Adam Wierman 44 26 0 30 Nov 2022
Robust Constrained Reinforcement Learning Yue Wang Fei Miao Shaofeng Zou 42 13 0 14 Sep 2022
Policy Gradient Method For Robust Reinforcement Learning Yue Wang Shaofeng Zou 81 71 0 15 May 2022
Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms Romain Laroche Rémi Tachet des Combes 53 2 0 15 Feb 2022
On the Convergence of SARSA with Linear Function Approximation Shangtong Zhang Rémi Tachet des Combes Romain Laroche 34 11 0 14 Feb 2022
STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence Liang Xu Daoming Lyu Yangchen Pan Aiwen Jiang Bo Liu 44 0 0 24 Jan 2022
Truncated Emphatic Temporal Difference Methods for Prediction and Control Shangtong Zhang Shimon Whiteson OffRL 28 12 0 11 Aug 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm S. Khodadadian Zaiwei Chen S. T. Maguluri CML OffRL 78 26 0 18 Feb 2021
A Finite Time Analysis of Two Time-Scale Actor Critic Methods Yue Wu Weitong Zhang Pan Xu Quanquan Gu 113 147 0 04 May 2020
On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation Harshat Kumar Alec Koppel Alejandro Ribeiro 106 80 0 18 Oct 2019