ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.01350
  4. Cited By
A Finite Time Analysis of Two Time-Scale Actor Critic Methods

A Finite Time Analysis of Two Time-Scale Actor Critic Methods

4 May 2020
Yue Wu
Weitong Zhang
Pan Xu
Quanquan Gu
ArXivPDFHTML

Papers citing "A Finite Time Analysis of Two Time-Scale Actor Critic Methods"

28 / 28 papers shown
Title
Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features
Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features
Zixuan Xie
Xinyu Liu
Rohan Chandra
Shangtong Zhang
24
0
0
27 May 2025
IISE PG&E Energy Analytics Challenge 2025: Hourly-Binned Regression Models Beat Transformers in Load Forecasting
IISE PG&E Energy Analytics Challenge 2025: Hourly-Binned Regression Models Beat Transformers in Load Forecasting
Millend Roy
Vladimir Pyltsov
Yinbo Hu
65
0
0
16 May 2025
Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis
Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis
Siddharth Chandak
76
1
0
18 Jan 2025
On The Global Convergence Of Online RLHF With Neural Parametrization
On The Global Convergence Of Online RLHF With Neural Parametrization
Mudit Gaur
Amrit Singh Bedi
Raghu Pasupathy
Vaneet Aggarwal
56
0
0
21 Oct 2024
A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee
A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee
Mo Zhou
Jian-Xiong Lu
58
8
0
11 Feb 2023
A policy gradient approach for Finite Horizon Constrained Markov Decision Processes
A policy gradient approach for Finite Horizon Constrained Markov Decision Processes
Soumyajit Guin
S. Bhatnagar
58
8
0
10 Oct 2022
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Shangtong Zhang
Rémi Tachet des Combes
Romain Laroche
67
11
0
04 Nov 2021
Non-asymptotic Convergence Analysis of Two Time-scale (Natural)
  Actor-Critic Algorithms
Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms
Tengyu Xu
Zhe Wang
Yingbin Liang
55
58
0
07 May 2020
Finite Time Analysis of Linear Two-timescale Stochastic Approximation
  with Markovian Noise
Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise
Maxim Kaledin
Eric Moulines
A. Naumov
V. Tadic
Hoi-To Wai
33
73
0
04 Feb 2020
Provably Efficient Exploration in Policy Optimization
Provably Efficient Exploration in Policy Optimization
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
34
278
0
12 Dec 2019
A Finite-Time Analysis of Q-Learning with Neural Network Function
  Approximation
A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation
Pan Xu
Quanquan Gu
44
66
0
10 Dec 2019
Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over
  Markovian Samples
Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples
Tengyu Xu
Shaofeng Zou
Yingbin Liang
49
73
0
26 Sep 2019
Sample Efficient Policy Gradient Methods with Recursive Variance
  Reduction
Sample Efficient Policy Gradient Methods with Recursive Variance Reduction
Pan Xu
F. Gao
Quanquan Gu
57
86
0
18 Sep 2019
Neural Policy Gradient Methods: Global Optimality and Rates of
  Convergence
Neural Policy Gradient Methods: Global Optimality and Rates of Convergence
Lingxiao Wang
Qi Cai
Zhuoran Yang
Zhaoran Wang
47
239
0
29 Aug 2019
Finite-Time Performance Bounds and Adaptive Learning Rate Selection for
  Two Time-Scale Reinforcement Learning
Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning
Harsh Gupta
R. Srikant
Lei Ying
41
85
0
14 Jul 2019
On the Global Convergence of Actor-Critic: A Case for Linear Quadratic
  Regulator with Ergodic Cost
On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost
Zhuoran Yang
Yongxin Chen
Mingyi Hong
Zhaoran Wang
76
39
0
14 Jul 2019
Global Convergence of Policy Gradient Methods to (Almost) Locally
  Optimal Policies
Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies
Kai Zhang
Alec Koppel
Haoqi Zhu
Tamer Basar
56
187
0
19 Jun 2019
Characterizing the Exact Behaviors of Temporal Difference Learning
  Algorithms Using Markov Jump Linear System Theory
Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory
Bin Hu
U. Syed
53
58
0
16 Jun 2019
An Improved Convergence Analysis of Stochastic Variance-Reduced Policy
  Gradient
An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient
Pan Xu
F. Gao
Quanquan Gu
44
94
0
29 May 2019
Finite-Sample Analysis for SARSA with Linear Function Approximation
Finite-Sample Analysis for SARSA with Linear Function Approximation
Shaofeng Zou
Tengyu Xu
Yingbin Liang
46
147
0
06 Feb 2019
Finite-Time Error Bounds For Linear Stochastic Approximation and TD
  Learning
Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning
R. Srikant
Lei Ying
55
249
0
03 Feb 2019
Is Q-learning Provably Efficient?
Is Q-learning Provably Efficient?
Chi Jin
Zeyuan Allen-Zhu
Sébastien Bubeck
Michael I. Jordan
OffRL
50
801
0
10 Jul 2018
Stochastic Variance-Reduced Policy Gradient
Stochastic Variance-Reduced Policy Gradient
Matteo Papini
Damiano Binaghi
Giuseppe Canonaco
Matteo Pirotta
Marcello Restelli
45
174
0
14 Jun 2018
A Finite Time Analysis of Temporal Difference Learning With Linear
  Function Approximation
A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation
Jalaj Bhandari
Daniel Russo
Raghav Singal
91
336
0
06 Jun 2018
Sample Efficient Actor-Critic with Experience Replay
Sample Efficient Actor-Critic with Experience Replay
Ziyun Wang
V. Bapst
N. Heess
Volodymyr Mnih
Rémi Munos
Koray Kavukcuoglu
Nando de Freitas
79
757
0
03 Nov 2016
An Actor-Critic Algorithm for Sequence Prediction
An Actor-Critic Algorithm for Sequence Prediction
Dzmitry Bahdanau
Philemon Brakel
Kelvin Xu
Anirudh Goyal
Ryan J. Lowe
Joelle Pineau
Aaron Courville
Yoshua Bengio
94
637
0
24 Jul 2016
Asynchronous Methods for Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih
Adria Puigdomenech Badia
M. Berk Mirza
Alex Graves
Timothy Lillicrap
Tim Harley
David Silver
Koray Kavukcuoglu
159
8,805
0
04 Feb 2016
Trust Region Policy Optimization
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
237
6,722
0
19 Feb 2015
1