Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
2106.00661
Cited By
v1
v2
v3
v4 (latest)
Reward is enough for convex MDPs
1 June 2021
Tom Zahavy
Brendan O'Donoghue
Guillaume Desjardins
Satinder Singh
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Reward is enough for convex MDPs"
29 / 29 papers shown
Title
Central Path Proximal Policy Optimization
Nikola Milosevic
Johannes Müller
Nico Scherf
80
2
0
31 May 2025
Online Episodic Convex Reinforcement Learning
B. Moreno
Khaled Eldowa
Pierre Gaillard
Margaux Brégère
Nadia Oudjane
OffRL
221
0
0
12 May 2025
Dual-Force: Enhanced Offline Diversity Maximization under Imitation Constraints
Pavel Kolev
Marin Vlastelica
Georg Martius
OffRL
123
3
0
08 Jan 2025
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Timofei Gritsaev
Nikita Morozov
S. Samsonov
D. Tiapkin
111
3
0
20 Oct 2024
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang
Lei Ying
OffRL
190
6
0
25 Sep 2024
The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes
Pedro P. Santos
Alberto Sardinha
Francisco S. Melo
52
1
0
23 Sep 2024
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Toshinori Kitamura
Tadashi Kozuno
Wataru Kumagai
Kenta Hoshino
Y. Hosoe
Kazumi Kasaura
Masashi Hamaya
Paavo Parmas
Yutaka Matsuo
220
3
0
29 Aug 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
203
3
0
11 Jun 2024
MetaCURL: Non-stationary Concave Utility Reinforcement Learning
B. Moreno
Margaux Brégère
Pierre Gaillard
Nadia Oudjane
OffRL
119
1
0
30 May 2024
A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints
Bram De Cooman
Johan A. K. Suykens
143
0
0
25 Apr 2024
On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal Tasks
Joar Skalse
Alessandro Abate
119
11
0
26 Jan 2024
Marketing Budget Allocation with Offline Constrained Deep Reinforcement Learning
Tianchi Cai
Jiyan Jiang
Wenpeng Zhang
Shiji Zhou
Xierui Song
Li Yu
Lihong Gu
Xiaodong Zeng
Jinjie Gu
Guannan Zhang
OffRL
73
7
0
06 Sep 2023
Diversifying AI: Towards Creative Chess with AlphaZero
Tom Zahavy
Vivek Veeriah
Shaobo Hou
Kevin Waugh
Matthew Lai
Edouard Leurent
Nenad Tomašev
Lisa Schut
Demis Hassabis
Satinder Singh
123
17
0
17 Aug 2023
Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs
Dongsheng Ding
Chen-Yu Wei
Jianchao Tan
Alejandro Ribeiro
160
23
0
20 Jun 2023
Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space
Anas Barakat
Ilyas Fatkhullin
Niao He
113
12
0
02 Jun 2023
Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities
Donghao Ying
Yunkai Zhang
Yuhao Ding
Alec Koppel
Javad Lavaei
128
15
0
27 May 2023
A Coupled Flow Approach to Imitation Learning
G. Freund
Elad Sarafian
Sarit Kraus
OOD
93
14
0
29 Apr 2023
Reinforcement Learning in Low-Rank MDPs with Density Features
Audrey Huang
Jinglin Chen
Nan Jiang
OffRL
102
14
0
04 Feb 2023
Policy Gradient for Reinforcement Learning with General Utilities
Navdeep Kumar
Kaixin Wang
Kfir Y. Levy
Shie Mannor
60
4
0
03 Oct 2022
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective
Raj Ghugare
Homanga Bharadhwaj
Benjamin Eysenbach
Sergey Levine
Ruslan Salakhutdinov
OffRL
156
28
0
18 Sep 2022
Versatile Skill Control via Self-supervised Adversarial Imitation of Unlabeled Mixed Motions
Chenhao Li
Sebastian Blaes
Pavel Kolev
Marin Vlastelica
Jonas Frey
Georg Martius
SSL
172
33
0
16 Sep 2022
Improved Policy Optimization for Online Imitation Learning
J. Lavington
Sharan Vaswani
Mark Schmidt
OffRL
112
7
0
29 Jul 2022
Active Exploration via Experiment Design in Markov Chains
Mojmír Mutný
Tadeusz Janik
Andreas Krause
109
16
0
29 Jun 2022
Towards Painless Policy Optimization for Constrained MDPs
Arushi Jain
Sharan Vaswani
Reza Babanezhad
Csaba Szepesvári
Doina Precup
93
7
0
11 Apr 2022
Your Policy Regularizer is Secretly an Adversary
Rob Brekelmans
Tim Genewein
Jordi Grau-Moya
Grégoire Delétang
M. Kunesch
Shane Legg
Pedro A. Ortega
AAML
117
16
0
23 Mar 2022
Challenging Common Assumptions in Convex Reinforcement Learning
Mirco Mutti
Ric De Santi
Piersilvio De Bartolomeis
Marcello Restelli
OffRL
140
23
0
03 Feb 2022
The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs
Johannes Muller
Guido Montúfar
132
8
0
14 Oct 2021
On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning
Guy Tennenholtz
Assaf Hallak
Gal Dalal
Shie Mannor
Gal Chechik
Uri Shalit
OOD
OffRL
160
16
0
13 Oct 2021
Concave Utility Reinforcement Learning: the Mean-Field Game Viewpoint
Matthieu Geist
Julien Pérolat
Mathieu Laurière
Romuald Elie
Sarah Perrin
Olivier Bachem
Rémi Munos
Olivier Pietquin
116
68
0
07 Jun 2021
1