Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.06392
Cited By
On the Global Convergence Rates of Softmax Policy Gradient Methods
13 May 2020
Jincheng Mei
Chenjun Xiao
Csaba Szepesvári
Dale Schuurmans
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Global Convergence Rates of Softmax Policy Gradient Methods"
50 / 185 papers shown
Title
Minimisation of Quasar-Convex Functions Using Random Zeroth-Order Oracles
Amir Ali Farzin
Yuen-Man Pun
Iman Shames
31
0
0
04 May 2025
Ordering-based Conditions for Global Convergence of Policy Gradient Methods
Jincheng Mei
Bo Dai
Alekh Agarwal
Mohammad Ghavamzadeh
Csaba Szepesvári
Dale Schuurmans
58
4
0
02 Apr 2025
Analysis of On-policy Policy Gradient Methods under the Distribution Mismatch
Weizhen Wang
Jianping He
Xiaoming Duan
34
0
0
28 Mar 2025
Efficient Learning for Entropy-Regularized Markov Decision Processes via Multilevel Monte Carlo
Matthieu Meunier
C. Reisinger
Yufei Zhang
39
0
0
27 Mar 2025
Larger or Smaller Reward Margins to Select Preferences for Alignment?
Kexin Huang
Junkang Wu
Ziqian Chen
Xue Wang
Jinyang Gao
Bolin Ding
Jiancan Wu
Xiangnan He
X. Wang
50
0
0
25 Feb 2025
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games
Tong Yang
Bo Dai
Lin Xiao
Yuejie Chi
OffRL
61
2
0
13 Feb 2025
Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates
Jincheng Mei
Bo Dai
Alekh Agarwal
Sharan Vaswani
Anant Raj
Csaba Szepesvári
Dale Schuurmans
89
0
0
11 Feb 2025
On Penalty-based Bilevel Gradient Descent Method
Han Shen
Quan-Wu Xiao
Tianyi Chen
60
51
0
08 Jan 2025
Structure Matters: Dynamic Policy Gradient
Sara Klein
Xiangyuan Zhang
Tamer Basar
Simon Weissmann
Leif Döring
35
0
0
07 Nov 2024
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
57
3
0
07 Nov 2024
Embedding Safety into RL: A New Take on Trust Region Methods
Nikola Milosevic
Johannes Müller
Nico Scherf
25
1
0
05 Nov 2024
Risk-sensitive control as inference with Rényi divergence
Kaito Ito
Kenji Kashima
34
1
0
04 Nov 2024
Improved Sample Complexity for Global Convergence of Actor-Critic Algorithms
Navdeep Kumar
Priyank Agrawal
Giorgia Ramponi
Kfir Y. Levy
Shie Mannor
33
0
0
11 Oct 2024
The Crucial Role of Samplers in Online Direct Preference Optimization
Ruizhe Shi
Runlong Zhou
Simon S. Du
55
8
0
29 Sep 2024
Towards Fast Rates for Federated and Multi-Task Reinforcement Learning
Feng Zhu
Robert W. Heath Jr.
Aritra Mitra
35
1
0
09 Sep 2024
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Toshinori Kitamura
Tadashi Kozuno
Wataru Kumagai
Kenta Hoshino
Y. Hosoe
Kazumi Kasaura
Masashi Hamaya
Paavo Parmas
Yutaka Matsuo
72
0
0
29 Aug 2024
Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning
Batuhan Yardim
Niao He
AI4CE
43
5
0
27 Aug 2024
q-exponential family for policy optimization
Lingwei Zhu
Haseeb Shah
Han Wang
Yukie Nagai
Martha White
OffRL
78
0
0
14 Aug 2024
Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation
Jean Seong Bjorn Choe
Jong-Kook Kim
40
2
0
25 Jul 2024
Functional Acceleration for Policy Mirror Descent
Veronica Chelu
Doina Precup
30
0
0
23 Jul 2024
Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning
Alessandro Montenegro
Marco Mussi
Matteo Papini
Alberto Maria Metelli
BDL
40
1
0
15 Jul 2024
Optimizing Novelty of Top-k Recommendations using Large Language Models and Reinforcement Learning
Amit Sharma
Hua Li
Xue Li
Jian Jiao
LRM
39
0
0
20 Jun 2024
A Generalized Version of Chung's Lemma and its Applications
Li Jiang
Xiao Li
Andre Milzarek
Junwen Qiu
45
1
0
09 Jun 2024
Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes
Johannes Muller
Semih Cayci
41
0
0
06 Jun 2024
Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning
Andreas Schlaginhaufen
Maryam Kamgarpour
OffRL
23
1
0
03 Jun 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
43
2
0
30 May 2024
A CMDP-within-online framework for Meta-Safe Reinforcement Learning
Vanshaj Khattar
Yuhao Ding
Bilgehan Sel
Javad Lavaei
Ming Jin
OffRL
32
12
0
26 May 2024
Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence
Minheng Xiao
Xian Yu
Lei Ying
37
2
0
23 May 2024
Almost sure convergence rates of stochastic gradient methods under gradient domination
Simon Weissmann
Sara Klein
Waïss Azizian
Leif Döring
39
3
0
22 May 2024
Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning
Sihan Zeng
Thinh T. Doan
54
5
0
15 May 2024
Linear Convergence of Independent Natural Policy Gradient in Games with Entropy Regularization
Youbang Sun
Tao-Wen Liu
P. R. Kumar
Shahin Shahrampour
37
0
0
04 May 2024
Learning Optimal Deterministic Policies with Stochastic Policy Gradients
Alessandro Montenegro
Marco Mussi
Alberto Maria Metelli
Matteo Papini
42
2
0
03 May 2024
Convergence of a model-free entropy-regularized inverse reinforcement learning algorithm
Titouan Renard
Andreas Schlaginhaufen
Tingting Ni
Maryam Kamgarpour
51
1
0
25 Mar 2024
Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles
Bhrij Patel
Wesley A Suttle
Alec Koppel
Vaneet Aggarwal
Brian M Sadler
Amrit Singh Bedi
Dinesh Manocha
32
1
0
18 Mar 2024
On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes
Navdeep Kumar
Yashaswini Murthy
Itai Shufaro
Kfir Y. Levy
R. Srikant
Shie Mannor
36
2
0
11 Mar 2024
Provable Policy Gradient Methods for Average-Reward Markov Potential Games
Min Cheng
Ruida Zhou
P. R. Kumar
Chao Tian
49
2
0
09 Mar 2024
Stochastic Gradient Succeeds for Bandits
Jincheng Mei
Zixin Zhong
Bo Dai
Alekh Agarwal
Csaba Szepesvári
Dale Schuurmans
37
1
0
27 Feb 2024
MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback and Dynamic Distance Constraint
Xinglin Zhou
Yifu Yuan
Shaofu Yang
Jianye Hao
34
1
0
22 Feb 2024
Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
Han Shen
Zhuoran Yang
Tianyi Chen
OffRL
34
14
0
10 Feb 2024
On the Complexity of Finite-Sum Smooth Optimization under the Polyak-Łojasiewicz Condition
Yunyan Bai
Yuxing Liu
Luo Luo
26
0
0
04 Feb 2024
Regularized Q-Learning with Linear Function Approximation
Jiachen Xi
Alfredo Garcia
P. Momcilovic
35
2
0
26 Jan 2024
On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization
Ling Liang
Haizhao Yang
14
1
0
23 Jan 2024
Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction
Jie Feng
Ke Wei
Jinchi Chen
30
1
0
02 Jan 2024
PPO-Clip Attains Global Optimality: Towards Deeper Understandings of Clipping
Nai-Chieh Huang
Ping-Chun Hsieh
Kuo-Hao Ho
I-Chen Wu
21
8
0
19 Dec 2023
Can Reinforcement Learning support policy makers? A preliminary study with Integrated Assessment Models
Theodore Wolf
Nantas Nardelli
John Shawe-Taylor
Maria Perez-Ortiz
21
1
0
11 Dec 2023
Onflow: an online portfolio allocation algorithm
G. Turinici
Pierre Brugiere
15
0
0
08 Dec 2023
Fast Policy Learning for Linear Quadratic Control with Entropy Regularization
Xin Guo
Xinyu Li
Renyuan Xu
36
3
0
23 Nov 2023
A Large Deviations Perspective on Policy Gradient Algorithms
Wouter Jongeneel
Daniel Kuhn
Mengmeng Li
28
1
0
13 Nov 2023
On the Second-Order Convergence of Biased Policy Gradient Algorithms
Siqiao Mu
Diego Klabjan
48
2
0
05 Nov 2023
Vanishing Gradients in Reinforcement Finetuning of Language Models
Noam Razin
Hattie Zhou
Omid Saremi
Vimal Thilak
Arwen Bradley
Preetum Nakkiran
Josh Susskind
Etai Littwin
15
7
0
31 Oct 2023
1
2
3
4
Next