Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.00135
Cited By
Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes
30 January 2021
Guanghui Lan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes"
30 / 30 papers shown
Title
Ordering-based Conditions for Global Convergence of Policy Gradient Methods
Jincheng Mei
Bo Dai
Alekh Agarwal
Mohammad Ghavamzadeh
Csaba Szepesvári
Dale Schuurmans
66
4
0
02 Apr 2025
Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates
Jincheng Mei
Bo Dai
Alekh Agarwal
Sharan Vaswani
Anant Raj
Csaba Szepesvári
Dale Schuurmans
89
0
0
11 Feb 2025
Mirror Descent Actor Critic via Bounded Advantage Learning
Ryo Iwaki
93
0
0
06 Feb 2025
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
57
3
0
07 Nov 2024
Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes
Johannes Muller
Semih Cayci
47
0
0
06 Jun 2024
Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning
Andreas Schlaginhaufen
Maryam Kamgarpour
OffRL
23
1
0
03 Jun 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
46
2
0
30 May 2024
Convergence of a model-free entropy-regularized inverse reinforcement learning algorithm
Titouan Renard
Andreas Schlaginhaufen
Tingting Ni
Maryam Kamgarpour
53
1
0
25 Mar 2024
Policy Mirror Descent with Lookahead
Kimon Protopapas
Anas Barakat
29
1
0
21 Mar 2024
On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization
Ling Liang
Haizhao Yang
14
1
0
23 Jan 2024
A Large Deviations Perspective on Policy Gradient Algorithms
Wouter Jongeneel
Daniel Kuhn
Mengmeng Li
31
1
0
13 Nov 2023
Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation
Uri Sherman
Tomer Koren
Yishay Mansour
32
12
0
30 Jan 2023
Stochastic Dimension-reduced Second-order Methods for Policy Optimization
Jinsong Liu
Chen Xie
Qinwen Deng
Dongdong Ge
Yi-Li Ye
32
1
0
28 Jan 2023
The Role of Baselines in Policy Gradient Optimization
Jincheng Mei
Wesley Chung
Valentin Thomas
Bo Dai
Csaba Szepesvári
Dale Schuurmans
29
15
0
16 Jan 2023
Mirror descent of Hopfield model
Hyungjoon Soh
D. Kim
Juno Hwang
Junghyo Jo
25
0
0
29 Nov 2022
Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games
Shicong Cen
Yuejie Chi
S. Du
Lin Xiao
59
35
0
03 Oct 2022
First-order Policy Optimization for Robust Markov Decision Process
Yan Li
Guanghui Lan
Tuo Zhao
77
23
0
21 Sep 2022
Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning
Ruida Zhou
Tao-Wen Liu
D. Kalathil
P. R. Kumar
Chao Tian
32
13
0
10 Jun 2022
Algorithm for Constrained Markov Decision Process with Linear Convergence
E. Gladin
Maksim Lavrik-Karmazin
K. Zainullina
Varvara Rudenko
Alexander V. Gasnikov
Martin Takáč
33
6
0
03 Jun 2022
Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization
Shicong Cen
Fan Chen
Yuejie Chi
33
15
0
12 Apr 2022
Accelerating Primal-dual Methods for Regularized Markov Decision Processes
Haoya Li
Hsiang-Fu Yu
Lexing Ying
Inderjit Dhillon
34
4
0
21 Feb 2022
Mirror Learning: A Unifying Framework of Policy Optimisation
J. Kuba
Christian Schroeder de Witt
Jakob N. Foerster
26
24
0
07 Jan 2022
Approximate Newton policy gradient algorithms
Haoya Li
Samarth Gupta
Hsiangfu Yu
Lexing Ying
Inderjit Dhillon
51
2
0
05 Oct 2021
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Andrea Zanette
Martin J. Wainwright
Emma Brunskill
OffRL
29
113
0
19 Aug 2021
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Alan Chan
Hugo Silva
Sungsu Lim
Tadashi Kozuno
A. R. Mahmood
Martha White
25
29
0
17 Jul 2021
On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control
Amrit Singh Bedi
Anjaly Parayil
Junyu Zhang
Mengdi Wang
Alec Koppel
38
15
0
15 Jun 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation
Zaiwei Chen
S. Khodadadian
S. T. Maguluri
OffRL
63
29
0
26 May 2021
On the Linear convergence of Natural Policy Gradient Algorithm
S. Khodadadian
P. Jhunjhunwala
Sushil Mahavir Varma
S. T. Maguluri
40
56
0
04 May 2021
Softmax Policy Gradient Methods Can Take Exponential Time to Converge
Gen Li
Yuting Wei
Yuejie Chi
Yuxin Chen
29
50
0
22 Feb 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
S. Khodadadian
Zaiwei Chen
S. T. Maguluri
CML
OffRL
71
26
0
18 Feb 2021
1