Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.04386
Cited By
v1
v2 (latest)
Policy Mirror Descent Inherently Explores Action Space
8 March 2023
Yan Li
Guanghui Lan
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Policy Mirror Descent Inherently Explores Action Space"
21 / 21 papers shown
Title
First-order Policy Optimization for Robust Markov Decision Process
Yan Li
Guanghui Lan
Tuo Zhao
149
25
0
21 Sep 2022
Stochastic first-order methods for average-reward Markov decision processes
Tianjiao Li
Feiyang Wu
Guanghui Lan
64
14
0
11 May 2022
Stochastic linear optimization never overfits with quadratically-bounded losses on general data
Matus Telgarsky
66
12
0
14 Feb 2022
Block Policy Mirror Descent
Guanghui Lan
Yan Li
T. Zhao
OffRL
61
10
0
15 Jan 2022
Actor-critic is implicitly biased towards high entropy optimal policies
Yuzheng Hu
Ziwei Ji
Matus Telgarsky
90
11
0
21 Oct 2021
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence
Wenhao Zhan
Shicong Cen
Baihe Huang
Yuxin Chen
Jason D. Lee
Yuejie Chi
69
78
0
24 May 2021
Provably Correct Optimization and Exploration with Non-linear Policies
Fei Feng
W. Yin
Alekh Agarwal
Lin F. Yang
140
13
0
22 Mar 2021
Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes
Guanghui Lan
184
143
0
30 Jan 2021
Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm
S. Khodadadian
Thinh T. Doan
Justin Romberg
S. T. Maguluri
85
43
0
26 Jan 2021
Simple and optimal methods for stochastic variational inequalities, II: Markovian noise and policy evaluation in reinforcement learning
Georgios Kotsalis
Guanghui Lan
Tianjiao Li
OffRL
57
32
0
15 Nov 2020
PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning
Alekh Agarwal
Mikael Henaff
Sham Kakade
Wen Sun
OffRL
78
110
0
16 Jul 2020
Optimistic Policy Optimization with Bandit Feedback
Yonathan Efroni
Lior Shani
Aviv A. Rosenberg
Shie Mannor
63
90
0
19 Feb 2020
Provably Efficient Exploration in Policy Optimization
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
83
283
0
12 Dec 2019
Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs
Lior Shani
Yonathan Efroni
Shie Mannor
60
176
0
06 Sep 2019
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
72
321
0
01 Aug 2019
Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning
Kyungjae Lee
Sungyub Kim
Sungbin Lim
Sungjoon Choi
Songhwai Oh
131
28
0
31 Jan 2019
Is Q-learning Provably Efficient?
Chi Jin
Zeyuan Allen-Zhu
Sébastien Bubeck
Michael I. Jordan
OffRL
81
812
0
10 Jul 2018
A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation
Jalaj Bhandari
Daniel Russo
Raghav Singal
113
340
0
06 Jun 2018
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
553
19,296
0
20 Jul 2017
Minimax Regret Bounds for Reinforcement Learning
M. G. Azar
Ian Osband
Rémi Munos
95
778
0
16 Mar 2017
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
281
6,801
0
19 Feb 2015
1