Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.00661
Cited By
v1
v2
v3
v4 (latest)
Reward is enough for convex MDPs
1 June 2021
Tom Zahavy
Brendan O'Donoghue
Guillaume Desjardins
Satinder Singh
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Reward is enough for convex MDPs"
23 / 23 papers shown
Title
Online Episodic Convex Reinforcement Learning
B. Moreno
Khaled Eldowa
Pierre Gaillard
Margaux Brégère
Nadia Oudjane
OffRL
207
0
0
12 May 2025
Dual-Force: Enhanced Offline Diversity Maximization under Imitation Constraints
Pavel Kolev
Marin Vlastelica
Georg Martius
OffRL
85
0
0
08 Jan 2025
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Timofei Gritsaev
Nikita Morozov
S. Samsonov
D. Tiapkin
87
3
0
20 Oct 2024
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang
Lei Ying
OffRL
164
4
0
25 Sep 2024
The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes
Pedro P. Santos
Alberto Sardinha
Francisco S. Melo
33
0
0
23 Sep 2024
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Toshinori Kitamura
Tadashi Kozuno
Wataru Kumagai
Kenta Hoshino
Y. Hosoe
Kazumi Kasaura
Masashi Hamaya
Paavo Parmas
Yutaka Matsuo
182
2
0
29 Aug 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
149
2
0
11 Jun 2024
MetaCURL: Non-stationary Concave Utility Reinforcement Learning
B. Moreno
Margaux Brégère
Pierre Gaillard
Nadia Oudjane
OffRL
105
1
0
30 May 2024
A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints
Bram De Cooman
Johan A. K. Suykens
113
0
0
25 Apr 2024
Marketing Budget Allocation with Offline Constrained Deep Reinforcement Learning
Tianchi Cai
Jiyan Jiang
Wenpeng Zhang
Shiji Zhou
Xierui Song
Li Yu
Lihong Gu
Xiaodong Zeng
Jinjie Gu
Guannan Zhang
OffRL
61
3
0
06 Sep 2023
Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs
Dongsheng Ding
Chen-Yu Wei
Jianchao Tan
Alejandro Ribeiro
132
22
0
20 Jun 2023
A Coupled Flow Approach to Imitation Learning
G. Freund
Elad Sarafian
Sarit Kraus
OOD
83
13
0
29 Apr 2023
Reinforcement Learning in Low-Rank MDPs with Density Features
Audrey Huang
Jinglin Chen
Nan Jiang
OffRL
90
14
0
04 Feb 2023
Policy Gradient for Reinforcement Learning with General Utilities
Navdeep Kumar
Kaixin Wang
Kfir Y. Levy
Shie Mannor
48
4
0
03 Oct 2022
Versatile Skill Control via Self-supervised Adversarial Imitation of Unlabeled Mixed Motions
Chenhao Li
Sebastian Blaes
Pavel Kolev
Marin Vlastelica
Jonas Frey
Georg Martius
SSL
130
31
0
16 Sep 2022
Improved Policy Optimization for Online Imitation Learning
J. Lavington
Sharan Vaswani
Mark Schmidt
OffRL
97
6
0
29 Jul 2022
Active Exploration via Experiment Design in Markov Chains
Mojmír Mutný
Tadeusz Janik
Andreas Krause
99
16
0
29 Jun 2022
Towards Painless Policy Optimization for Constrained MDPs
Arushi Jain
Sharan Vaswani
Reza Babanezhad
Csaba Szepesvári
Doina Precup
83
7
0
11 Apr 2022
Your Policy Regularizer is Secretly an Adversary
Rob Brekelmans
Tim Genewein
Jordi Grau-Moya
Grégoire Delétang
M. Kunesch
Shane Legg
Pedro A. Ortega
AAML
93
14
0
23 Mar 2022
Challenging Common Assumptions in Convex Reinforcement Learning
Mirco Mutti
Ric De Santi
Piersilvio De Bartolomeis
Marcello Restelli
OffRL
116
23
0
03 Feb 2022
The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs
Johannes Muller
Guido Montúfar
100
8
0
14 Oct 2021
On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning
Guy Tennenholtz
Assaf Hallak
Gal Dalal
Shie Mannor
Gal Chechik
Uri Shalit
OOD
OffRL
123
16
0
13 Oct 2021
Concave Utility Reinforcement Learning: the Mean-Field Game Viewpoint
Matthieu Geist
Julien Pérolat
Mathieu Laurière
Romuald Elie
Sarah Perrin
Olivier Bachem
Rémi Munos
Olivier Pietquin
112
65
0
07 Jun 2021
1