ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.02997
  4. Cited By
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

4 November 2021
Shangtong Zhang
Rémi Tachet des Combes
Romain Laroche
ArXivPDFHTML

Papers citing "Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch"

12 / 12 papers shown
Title
Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features
Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features
Zixuan Xie
Xinyu Liu
Rohan Chandra
Shangtong Zhang
5
0
0
27 May 2025
A Finite-Sample Analysis of Payoff-Based Independent Learning in
  Zero-Sum Stochastic Games
A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games
Zaiwei Chen
Kai Zhang
Eric Mazumdar
Asuman Ozdaglar
Adam Wierman
79
6
0
03 Mar 2023
Global Convergence of Localized Policy Iteration in Networked
  Multi-Agent Reinforcement Learning
Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning
Yizhou Zhang
Guannan Qu
Pan Xu
Yiheng Lin
Zaiwei Chen
Adam Wierman
44
26
0
30 Nov 2022
Robust Constrained Reinforcement Learning
Robust Constrained Reinforcement Learning
Yue Wang
Fei Miao
Shaofeng Zou
42
13
0
14 Sep 2022
Policy Gradient Method For Robust Reinforcement Learning
Policy Gradient Method For Robust Reinforcement Learning
Yue Wang
Shaofeng Zou
81
71
0
15 May 2022
Beyond the Policy Gradient Theorem for Efficient Policy Updates in
  Actor-Critic Algorithms
Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms
Romain Laroche
Rémi Tachet des Combes
53
2
0
15 Feb 2022
On the Convergence of SARSA with Linear Function Approximation
On the Convergence of SARSA with Linear Function Approximation
Shangtong Zhang
Rémi Tachet des Combes
Romain Laroche
34
11
0
14 Feb 2022
STOPS: Short-Term-based Volatility-controlled Policy Search and its
  Global Convergence
STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence
Liang Xu
Daoming Lyu
Yangchen Pan
Aiwen Jiang
Bo Liu
44
0
0
24 Jan 2022
Truncated Emphatic Temporal Difference Methods for Prediction and
  Control
Truncated Emphatic Temporal Difference Methods for Prediction and Control
Shangtong Zhang
Shimon Whiteson
OffRL
28
12
0
11 Aug 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
S. Khodadadian
Zaiwei Chen
S. T. Maguluri
CML
OffRL
78
26
0
18 Feb 2021
A Finite Time Analysis of Two Time-Scale Actor Critic Methods
A Finite Time Analysis of Two Time-Scale Actor Critic Methods
Yue Wu
Weitong Zhang
Pan Xu
Quanquan Gu
113
147
0
04 May 2020
On the Sample Complexity of Actor-Critic Method for Reinforcement
  Learning with Function Approximation
On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation
Harshat Kumar
Alec Koppel
Alejandro Ribeiro
106
80
0
18 Oct 2019
1