ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.11275
  4. Cited By
A Theory of Regularized Markov Decision Processes

A Theory of Regularized Markov Decision Processes

31 January 2019
M. Geist
B. Scherrer
Olivier Pietquin
ArXivPDFHTML

Papers citing "A Theory of Regularized Markov Decision Processes"

50 / 91 papers shown
Title
ShiQ: Bringing back Bellman to LLMs
ShiQ: Bringing back Bellman to LLMs
Pierre Clavier
Nathan Grinsztajn
Raphaël Avalos
Yannis Flet-Berliac
Irem Ergun
...
Eugene Tarassov
Olivier Pietquin
Pierre Harvey Richemond
Florian Strub
Matthieu Geist
OffRL
16
0
0
16 May 2025
A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance
A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance
Axel Friedrich Wolter
Tobias Sutter
OffRL
39
0
0
07 May 2025
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
85
0
0
26 Feb 2025
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Qingyue Zhao
Kaixuan Ji
Heyang Zhao
Tong Zhang
Q. Gu
OffRL
52
0
0
09 Feb 2025
Mirror Descent Actor Critic via Bounded Advantage Learning
Mirror Descent Actor Critic via Bounded Advantage Learning
Ryo Iwaki
98
0
0
06 Feb 2025
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Yannis Flet-Berliac
Nathan Grinsztajn
Florian Strub
Bill Wu
Eugene Choi
...
Arash Ahmadian
Yash Chandak
M. G. Azar
Olivier Pietquin
Matthieu Geist
OffRL
68
5
0
17 Jan 2025
Bounded Rationality Equilibrium Learning in Mean Field Games
Bounded Rationality Equilibrium Learning in Mean Field Games
Yannick Eich
Christian Fabian
Kai Cui
Heinz Koeppl
38
0
0
11 Nov 2024
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
71
3
0
07 Nov 2024
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu
Zhiwei He
Xiaofeng Wang
Pengfei Liu
Rui Wang
OSLM
67
4
0
24 Oct 2024
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Timofei Gritsaev
Nikita Morozov
S. Samsonov
D. Tiapkin
26
0
0
20 Oct 2024
Last Iterate Convergence in Monotone Mean Field Games
Last Iterate Convergence in Monotone Mean Field Games
Noboru Isobe
Kenshi Abe
Kaito Ariu
51
0
0
07 Oct 2024
Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with
  Energy-Based Models
Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models
Sangwoong Yoon
Himchan Hwang
Dohyun Kwon
Yung-Kyun Noh
Frank C. Park
44
3
0
30 Jun 2024
WARP: On the Benefits of Weight Averaged Rewarded Policies
WARP: On the Benefits of Weight Averaged Rewarded Policies
Alexandre Ramé
Johan Ferret
Nino Vieillard
Robert Dadashi
Léonard Hussenot
Pierre-Louis Cedoz
Pier Giuseppe Sessa
Sertan Girgin
Arthur Douillard
Olivier Bachem
62
14
0
24 Jun 2024
Decoupling regularization from the action space
Decoupling regularization from the action space
Sobhan Mohammadpour
Emma Frejinger
Pierre-Luc Bacon
37
0
0
10 Jun 2024
Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes
Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes
Johannes Muller
Semih Cayci
50
0
0
06 Jun 2024
Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning
Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning
Andreas Schlaginhaufen
Maryam Kamgarpour
OffRL
25
1
0
03 Jun 2024
Performance of NPG in Countable State-Space Average-Cost RL
Performance of NPG in Countable State-Space Average-Cost RL
Yashaswini Murthy
Isaac Grosof
S. T. Maguluri
R. Srikant
OffRL
39
1
0
30 May 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
55
2
0
30 May 2024
Convergence of a model-free entropy-regularized inverse reinforcement learning algorithm
Convergence of a model-free entropy-regularized inverse reinforcement learning algorithm
Titouan Renard
Andreas Schlaginhaufen
Tingting Ni
Maryam Kamgarpour
64
1
0
25 Mar 2024
Enhancing Reinforcement Learning Agents with Local Guides
Enhancing Reinforcement Learning Agents with Local Guides
Paul Daoudi
Bogdan Robu
Christophe Prieur
Ludovic Dos Santos
M. Barlier
OnRL
33
3
0
21 Feb 2024
Regularized Q-Learning with Linear Function Approximation
Regularized Q-Learning with Linear Function Approximation
Jiachen Xi
Alfredo Garcia
P. Momcilovic
49
2
0
26 Jan 2024
On Task-Relevant Loss Functions in Meta-Reinforcement Learning and
  Online LQR
On Task-Relevant Loss Functions in Meta-Reinforcement Learning and Online LQR
Jaeuk Shin
Giho Kim
Howon Lee
Joonho Han
Insoon Yang
OffRL
45
1
0
09 Dec 2023
Bridging the Gap between Newton-Raphson Method and Regularized Policy
  Iteration
Bridging the Gap between Newton-Raphson Method and Regularized Policy Iteration
Zeyang Li
Chuxiong Hu
Yunan Wang
Guojian Zhan
Jie Li
Shengbo Eben Li
32
0
0
11 Oct 2023
Heterogeneous Multi-Agent Reinforcement Learning via Mirror Descent
  Policy Optimization
Heterogeneous Multi-Agent Reinforcement Learning via Mirror Descent Policy Optimization
Mohammad Mehdi Nasiri
M. Rezghi
47
0
0
13 Aug 2023
Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with
  Q-Value Predictions
Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions
Tongxin Li
Yiheng Lin
Shaolei Ren
Adam Wierman
AAML
OffRL
44
6
0
20 Jul 2023
Identifiability and Generalizability in Constrained Inverse
  Reinforcement Learning
Identifiability and Generalizability in Constrained Inverse Reinforcement Learning
Andreas Schlaginhaufen
Maryam Kamgarpour
34
10
0
01 Jun 2023
Coherent Soft Imitation Learning
Coherent Soft Imitation Learning
Joe Watson
Sandy H. Huang
Nicholas Heess
39
11
0
25 May 2023
Regularization and Variance-Weighted Regression Achieves Minimax
  Optimality in Linear MDPs: Theory and Practice
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
Toshinori Kitamura
Tadashi Kozuno
Yunhao Tang
Nino Vieillard
Michal Valko
...
Olivier Pietquin
M. Geist
Csaba Szepesvári
Wataru Kumagai
Yutaka Matsuo
OffRL
35
3
0
22 May 2023
Offline RL with No OOD Actions: In-Sample Learning via Implicit Value
  Regularization
Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization
Haoran Xu
Li Jiang
Jianxiong Li
Zhuoran Yang
Zhaoran Wang
Victor Chan
Xianyuan Zhan
OffRL
41
73
0
28 Mar 2023
Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and
  Global Optimality
Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality
François Ged
M. H. Veiga
38
0
0
22 Mar 2023
Twice Regularized Markov Decision Processes: The Equivalence between
  Robustness and Regularization
Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization
E. Derman
Yevgeniy Men
M. Geist
Shie Mannor
47
1
0
12 Mar 2023
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage
Masatoshi Uehara
Nathan Kallus
Jason D. Lee
Wen Sun
OffRL
55
5
0
05 Feb 2023
Fast Computation of Optimal Transport via Entropy-Regularized
  Extragradient Methods
Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods
Gen Li
Yanxi Chen
Yu Huang
Yuejie Chi
H. Vincent Poor
Yuxin Chen
OT
51
5
0
30 Jan 2023
Robust Imitation via Mirror Descent Inverse Reinforcement Learning
Robust Imitation via Mirror Descent Inverse Reinforcement Learning
Dong-Sig Han
Hyunseok Kim
Hyun-Dong Lee
Je-hwan Ryu
Byoung-Tak Zhang
33
2
0
20 Oct 2022
Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
  Markov Games
Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games
Shicong Cen
Yuejie Chi
S. Du
Lin Xiao
66
35
0
03 Oct 2022
Look where you look! Saliency-guided Q-networks for generalization in
  visual Reinforcement Learning
Look where you look! Saliency-guided Q-networks for generalization in visual Reinforcement Learning
David Bertoin
Adil Zouitine
Mehdi Zouitine
Emmanuel Rachelson
45
30
0
16 Sep 2022
RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk
RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk
J. Hau
Marek Petrik
Mohammad Ghavamzadeh
R. Russel
37
5
0
09 Sep 2022
Weighted Maximum Entropy Inverse Reinforcement Learning
Weighted Maximum Entropy Inverse Reinforcement Learning
Viet The Bui
Tien Mai
Patrick Jaillet
27
0
0
20 Aug 2022
Optimal scheduling of entropy regulariser for continuous-time
  linear-quadratic reinforcement learning
Optimal scheduling of entropy regulariser for continuous-time linear-quadratic reinforcement learning
Lukasz Szpruch
Tanut Treetanthiploet
Yufei Zhang
41
8
0
08 Aug 2022
The Power of Regularization in Solving Extensive-Form Games
The Power of Regularization in Solving Extensive-Form Games
Ming Liu
Asuman Ozdaglar
Tiancheng Yu
Kai Zhang
36
20
0
19 Jun 2022
Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When
  to Act
Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When to Act
Alexis Jacq
Johan Ferret
Olivier Pietquin
M. Geist
32
9
0
16 Mar 2022
DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement
  Learning
DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning
Jinxin Liu
Hongyin Zhang
Donglin Wang
OffRL
38
33
0
13 Mar 2022
Accelerating Primal-dual Methods for Regularized Markov Decision
  Processes
Accelerating Primal-dual Methods for Regularized Markov Decision Processes
Haoya Li
Hsiang-Fu Yu
Lexing Ying
Inderjit Dhillon
39
4
0
21 Feb 2022
Regularized Q-learning
Regularized Q-learning
Han-Dong Lim
Donghwan Lee
29
10
0
11 Feb 2022
Adversarially Trained Actor Critic for Offline Reinforcement Learning
Adversarially Trained Actor Critic for Offline Reinforcement Learning
Ching-An Cheng
Tengyang Xie
Nan Jiang
Alekh Agarwal
OffRL
23
127
0
05 Feb 2022
You May Not Need Ratio Clipping in PPO
You May Not Need Ratio Clipping in PPO
Mingfei Sun
Vitaly Kurin
Guoqing Liu
Sam Devlin
Tao Qin
Katja Hofmann
Shimon Whiteson
21
15
0
31 Jan 2022
Convergence of Policy Gradient for Entropy Regularized MDPs with Neural
  Network Approximation in the Mean-Field Regime
Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime
B. Kerimkulov
J. Leahy
David Siska
Lukasz Szpruch
43
12
0
18 Jan 2022
Recent Advances in Reinforcement Learning in Finance
Recent Advances in Reinforcement Learning in Finance
B. Hambly
Renyuan Xu
Huining Yang
OffRL
34
168
0
08 Dec 2021
Twice regularized MDPs and the equivalence between robustness and
  regularization
Twice regularized MDPs and the equivalence between robustness and regularization
E. Derman
M. Geist
Shie Mannor
53
54
0
12 Oct 2021
Approximate Newton policy gradient algorithms
Approximate Newton policy gradient algorithms
Haoya Li
Samarth Gupta
Hsiangfu Yu
Lexing Ying
Inderjit Dhillon
56
2
0
05 Oct 2021
12
Next