Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1901.11275
Cited By
A Theory of Regularized Markov Decision Processes
31 January 2019
M. Geist
B. Scherrer
Olivier Pietquin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Theory of Regularized Markov Decision Processes"
50 / 91 papers shown
Title
ShiQ: Bringing back Bellman to LLMs
Pierre Clavier
Nathan Grinsztajn
Raphaël Avalos
Yannis Flet-Berliac
Irem Ergun
...
Eugene Tarassov
Olivier Pietquin
Pierre Harvey Richemond
Florian Strub
Matthieu Geist
OffRL
16
0
0
16 May 2025
A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance
Axel Friedrich Wolter
Tobias Sutter
OffRL
39
0
0
07 May 2025
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
85
0
0
26 Feb 2025
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Qingyue Zhao
Kaixuan Ji
Heyang Zhao
Tong Zhang
Q. Gu
OffRL
52
0
0
09 Feb 2025
Mirror Descent Actor Critic via Bounded Advantage Learning
Ryo Iwaki
98
0
0
06 Feb 2025
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Yannis Flet-Berliac
Nathan Grinsztajn
Florian Strub
Bill Wu
Eugene Choi
...
Arash Ahmadian
Yash Chandak
M. G. Azar
Olivier Pietquin
Matthieu Geist
OffRL
68
5
0
17 Jan 2025
Bounded Rationality Equilibrium Learning in Mean Field Games
Yannick Eich
Christian Fabian
Kai Cui
Heinz Koeppl
38
0
0
11 Nov 2024
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
71
3
0
07 Nov 2024
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu
Zhiwei He
Xiaofeng Wang
Pengfei Liu
Rui Wang
OSLM
67
4
0
24 Oct 2024
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Timofei Gritsaev
Nikita Morozov
S. Samsonov
D. Tiapkin
26
0
0
20 Oct 2024
Last Iterate Convergence in Monotone Mean Field Games
Noboru Isobe
Kenshi Abe
Kaito Ariu
51
0
0
07 Oct 2024
Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models
Sangwoong Yoon
Himchan Hwang
Dohyun Kwon
Yung-Kyun Noh
Frank C. Park
44
3
0
30 Jun 2024
WARP: On the Benefits of Weight Averaged Rewarded Policies
Alexandre Ramé
Johan Ferret
Nino Vieillard
Robert Dadashi
Léonard Hussenot
Pierre-Louis Cedoz
Pier Giuseppe Sessa
Sertan Girgin
Arthur Douillard
Olivier Bachem
62
14
0
24 Jun 2024
Decoupling regularization from the action space
Sobhan Mohammadpour
Emma Frejinger
Pierre-Luc Bacon
37
0
0
10 Jun 2024
Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes
Johannes Muller
Semih Cayci
50
0
0
06 Jun 2024
Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning
Andreas Schlaginhaufen
Maryam Kamgarpour
OffRL
25
1
0
03 Jun 2024
Performance of NPG in Countable State-Space Average-Cost RL
Yashaswini Murthy
Isaac Grosof
S. T. Maguluri
R. Srikant
OffRL
39
1
0
30 May 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
55
2
0
30 May 2024
Convergence of a model-free entropy-regularized inverse reinforcement learning algorithm
Titouan Renard
Andreas Schlaginhaufen
Tingting Ni
Maryam Kamgarpour
64
1
0
25 Mar 2024
Enhancing Reinforcement Learning Agents with Local Guides
Paul Daoudi
Bogdan Robu
Christophe Prieur
Ludovic Dos Santos
M. Barlier
OnRL
33
3
0
21 Feb 2024
Regularized Q-Learning with Linear Function Approximation
Jiachen Xi
Alfredo Garcia
P. Momcilovic
49
2
0
26 Jan 2024
On Task-Relevant Loss Functions in Meta-Reinforcement Learning and Online LQR
Jaeuk Shin
Giho Kim
Howon Lee
Joonho Han
Insoon Yang
OffRL
45
1
0
09 Dec 2023
Bridging the Gap between Newton-Raphson Method and Regularized Policy Iteration
Zeyang Li
Chuxiong Hu
Yunan Wang
Guojian Zhan
Jie Li
Shengbo Eben Li
32
0
0
11 Oct 2023
Heterogeneous Multi-Agent Reinforcement Learning via Mirror Descent Policy Optimization
Mohammad Mehdi Nasiri
M. Rezghi
47
0
0
13 Aug 2023
Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions
Tongxin Li
Yiheng Lin
Shaolei Ren
Adam Wierman
AAML
OffRL
44
6
0
20 Jul 2023
Identifiability and Generalizability in Constrained Inverse Reinforcement Learning
Andreas Schlaginhaufen
Maryam Kamgarpour
34
10
0
01 Jun 2023
Coherent Soft Imitation Learning
Joe Watson
Sandy H. Huang
Nicholas Heess
39
11
0
25 May 2023
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
Toshinori Kitamura
Tadashi Kozuno
Yunhao Tang
Nino Vieillard
Michal Valko
...
Olivier Pietquin
M. Geist
Csaba Szepesvári
Wataru Kumagai
Yutaka Matsuo
OffRL
35
3
0
22 May 2023
Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization
Haoran Xu
Li Jiang
Jianxiong Li
Zhuoran Yang
Zhaoran Wang
Victor Chan
Xianyuan Zhan
OffRL
41
73
0
28 Mar 2023
Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality
François Ged
M. H. Veiga
38
0
0
22 Mar 2023
Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization
E. Derman
Yevgeniy Men
M. Geist
Shie Mannor
47
1
0
12 Mar 2023
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage
Masatoshi Uehara
Nathan Kallus
Jason D. Lee
Wen Sun
OffRL
55
5
0
05 Feb 2023
Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods
Gen Li
Yanxi Chen
Yu Huang
Yuejie Chi
H. Vincent Poor
Yuxin Chen
OT
51
5
0
30 Jan 2023
Robust Imitation via Mirror Descent Inverse Reinforcement Learning
Dong-Sig Han
Hyunseok Kim
Hyun-Dong Lee
Je-hwan Ryu
Byoung-Tak Zhang
33
2
0
20 Oct 2022
Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games
Shicong Cen
Yuejie Chi
S. Du
Lin Xiao
66
35
0
03 Oct 2022
Look where you look! Saliency-guided Q-networks for generalization in visual Reinforcement Learning
David Bertoin
Adil Zouitine
Mehdi Zouitine
Emmanuel Rachelson
45
30
0
16 Sep 2022
RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk
J. Hau
Marek Petrik
Mohammad Ghavamzadeh
R. Russel
37
5
0
09 Sep 2022
Weighted Maximum Entropy Inverse Reinforcement Learning
Viet The Bui
Tien Mai
Patrick Jaillet
27
0
0
20 Aug 2022
Optimal scheduling of entropy regulariser for continuous-time linear-quadratic reinforcement learning
Lukasz Szpruch
Tanut Treetanthiploet
Yufei Zhang
41
8
0
08 Aug 2022
The Power of Regularization in Solving Extensive-Form Games
Ming Liu
Asuman Ozdaglar
Tiancheng Yu
Kai Zhang
36
20
0
19 Jun 2022
Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When to Act
Alexis Jacq
Johan Ferret
Olivier Pietquin
M. Geist
32
9
0
16 Mar 2022
DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning
Jinxin Liu
Hongyin Zhang
Donglin Wang
OffRL
38
33
0
13 Mar 2022
Accelerating Primal-dual Methods for Regularized Markov Decision Processes
Haoya Li
Hsiang-Fu Yu
Lexing Ying
Inderjit Dhillon
39
4
0
21 Feb 2022
Regularized Q-learning
Han-Dong Lim
Donghwan Lee
29
10
0
11 Feb 2022
Adversarially Trained Actor Critic for Offline Reinforcement Learning
Ching-An Cheng
Tengyang Xie
Nan Jiang
Alekh Agarwal
OffRL
23
127
0
05 Feb 2022
You May Not Need Ratio Clipping in PPO
Mingfei Sun
Vitaly Kurin
Guoqing Liu
Sam Devlin
Tao Qin
Katja Hofmann
Shimon Whiteson
21
15
0
31 Jan 2022
Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime
B. Kerimkulov
J. Leahy
David Siska
Lukasz Szpruch
43
12
0
18 Jan 2022
Recent Advances in Reinforcement Learning in Finance
B. Hambly
Renyuan Xu
Huining Yang
OffRL
34
168
0
08 Dec 2021
Twice regularized MDPs and the equivalence between robustness and regularization
E. Derman
M. Geist
Shie Mannor
53
54
0
12 Oct 2021
Approximate Newton policy gradient algorithms
Haoya Li
Samarth Gupta
Hsiangfu Yu
Lexing Ying
Inderjit Dhillon
56
2
0
05 Oct 2021
1
2
Next