Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.01350
Cited By
A Finite Time Analysis of Two Time-Scale Actor Critic Methods
4 May 2020
Yue Wu
Weitong Zhang
Pan Xu
Quanquan Gu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Finite Time Analysis of Two Time-Scale Actor Critic Methods"
28 / 28 papers shown
Title
Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features
Zixuan Xie
Xinyu Liu
Rohan Chandra
Shangtong Zhang
24
0
0
27 May 2025
IISE PG&E Energy Analytics Challenge 2025: Hourly-Binned Regression Models Beat Transformers in Load Forecasting
Millend Roy
Vladimir Pyltsov
Yinbo Hu
65
0
0
16 May 2025
Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis
Siddharth Chandak
76
1
0
18 Jan 2025
On The Global Convergence Of Online RLHF With Neural Parametrization
Mudit Gaur
Amrit Singh Bedi
Raghu Pasupathy
Vaneet Aggarwal
56
0
0
21 Oct 2024
A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee
Mo Zhou
Jian-Xiong Lu
58
8
0
11 Feb 2023
A policy gradient approach for Finite Horizon Constrained Markov Decision Processes
Soumyajit Guin
S. Bhatnagar
58
8
0
10 Oct 2022
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Shangtong Zhang
Rémi Tachet des Combes
Romain Laroche
67
11
0
04 Nov 2021
Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms
Tengyu Xu
Zhe Wang
Yingbin Liang
55
58
0
07 May 2020
Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise
Maxim Kaledin
Eric Moulines
A. Naumov
V. Tadic
Hoi-To Wai
33
73
0
04 Feb 2020
Provably Efficient Exploration in Policy Optimization
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
34
278
0
12 Dec 2019
A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation
Pan Xu
Quanquan Gu
44
66
0
10 Dec 2019
Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples
Tengyu Xu
Shaofeng Zou
Yingbin Liang
49
73
0
26 Sep 2019
Sample Efficient Policy Gradient Methods with Recursive Variance Reduction
Pan Xu
F. Gao
Quanquan Gu
57
86
0
18 Sep 2019
Neural Policy Gradient Methods: Global Optimality and Rates of Convergence
Lingxiao Wang
Qi Cai
Zhuoran Yang
Zhaoran Wang
47
239
0
29 Aug 2019
Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning
Harsh Gupta
R. Srikant
Lei Ying
41
85
0
14 Jul 2019
On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost
Zhuoran Yang
Yongxin Chen
Mingyi Hong
Zhaoran Wang
76
39
0
14 Jul 2019
Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies
Kai Zhang
Alec Koppel
Haoqi Zhu
Tamer Basar
56
187
0
19 Jun 2019
Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory
Bin Hu
U. Syed
53
58
0
16 Jun 2019
An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient
Pan Xu
F. Gao
Quanquan Gu
44
94
0
29 May 2019
Finite-Sample Analysis for SARSA with Linear Function Approximation
Shaofeng Zou
Tengyu Xu
Yingbin Liang
46
147
0
06 Feb 2019
Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning
R. Srikant
Lei Ying
55
249
0
03 Feb 2019
Is Q-learning Provably Efficient?
Chi Jin
Zeyuan Allen-Zhu
Sébastien Bubeck
Michael I. Jordan
OffRL
50
801
0
10 Jul 2018
Stochastic Variance-Reduced Policy Gradient
Matteo Papini
Damiano Binaghi
Giuseppe Canonaco
Matteo Pirotta
Marcello Restelli
45
174
0
14 Jun 2018
A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation
Jalaj Bhandari
Daniel Russo
Raghav Singal
91
336
0
06 Jun 2018
Sample Efficient Actor-Critic with Experience Replay
Ziyun Wang
V. Bapst
N. Heess
Volodymyr Mnih
Rémi Munos
Koray Kavukcuoglu
Nando de Freitas
79
757
0
03 Nov 2016
An Actor-Critic Algorithm for Sequence Prediction
Dzmitry Bahdanau
Philemon Brakel
Kelvin Xu
Anirudh Goyal
Ryan J. Lowe
Joelle Pineau
Aaron Courville
Yoshua Bengio
94
637
0
24 Jul 2016
Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih
Adria Puigdomenech Badia
M. Berk Mirza
Alex Graves
Timothy Lillicrap
Tim Harley
David Silver
Koray Kavukcuoglu
159
8,805
0
04 Feb 2016
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
237
6,722
0
19 Feb 2015
1