v1v2v3v4 (latest)

Reward is enough for convex MDPs

1 June 2021

Papers citing "Reward is enough for convex MDPs"

29 / 29 papers shown

Title
Central Path Proximal Policy Optimization Nikola Milosevic Johannes Müller Nico Scherf 80 2 0 31 May 2025
Online Episodic Convex Reinforcement Learning B. Moreno Khaled Eldowa Pierre Gaillard Margaux Brégère Nadia Oudjane OffRL 221 0 0 12 May 2025
Dual-Force: Enhanced Offline Diversity Maximization under Imitation Constraints Pavel Kolev Marin Vlastelica Georg Martius OffRL 123 3 0 08 Jan 2025
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization Timofei Gritsaev Nikita Morozov S. Samsonov D. Tiapkin 111 3 0 20 Oct 2024
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference Qining Zhang Lei Ying OffRL 190 6 0 25 Sep 2024
The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes Pedro P. Santos Alberto Sardinha Francisco S. Melo 52 1 0 23 Sep 2024
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form Toshinori Kitamura Tadashi Kozuno Wataru Kumagai Kenta Hoshino Y. Hosoe Kazumi Kasaura Masashi Hamaya Paavo Parmas Yutaka Matsuo 220 3 0 29 Aug 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis Qining Zhang Honghao Wei Lei Ying OffRL 203 3 0 11 Jun 2024
MetaCURL: Non-stationary Concave Utility Reinforcement Learning B. Moreno Margaux Brégère Pierre Gaillard Nadia Oudjane OffRL 119 1 0 30 May 2024
A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints Bram De Cooman Johan A. K. Suykens 143 0 0 25 Apr 2024
On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal Tasks Joar Skalse Alessandro Abate 119 11 0 26 Jan 2024
Marketing Budget Allocation with Offline Constrained Deep Reinforcement Learning Tianchi Cai Jiyan Jiang Wenpeng Zhang Shiji Zhou Xierui Song Li Yu Lihong Gu Xiaodong Zeng Jinjie Gu Guannan Zhang OffRL 73 7 0 06 Sep 2023
Diversifying AI: Towards Creative Chess with AlphaZero Tom Zahavy Vivek Veeriah Shaobo Hou Kevin Waugh Matthew Lai Edouard Leurent Nenad Tomašev Lisa Schut Demis Hassabis Satinder Singh 123 17 0 17 Aug 2023
Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs Dongsheng Ding Chen-Yu Wei Jianchao Tan Alejandro Ribeiro 160 23 0 20 Jun 2023
Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space Anas Barakat Ilyas Fatkhullin Niao He 113 12 0 02 Jun 2023
Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities Donghao Ying Yunkai Zhang Yuhao Ding Alec Koppel Javad Lavaei 128 15 0 27 May 2023
A Coupled Flow Approach to Imitation Learning G. Freund Elad Sarafian Sarit Kraus OOD 93 14 0 29 Apr 2023
Reinforcement Learning in Low-Rank MDPs with Density Features Audrey Huang Jinglin Chen Nan Jiang OffRL 102 14 0 04 Feb 2023
Policy Gradient for Reinforcement Learning with General Utilities Navdeep Kumar Kaixin Wang Kfir Y. Levy Shie Mannor 60 4 0 03 Oct 2022
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective Raj Ghugare Homanga Bharadhwaj Benjamin Eysenbach Sergey Levine Ruslan Salakhutdinov OffRL 156 28 0 18 Sep 2022
Versatile Skill Control via Self-supervised Adversarial Imitation of Unlabeled Mixed Motions Chenhao Li Sebastian Blaes Pavel Kolev Marin Vlastelica Jonas Frey Georg Martius SSL 172 33 0 16 Sep 2022
Improved Policy Optimization for Online Imitation Learning J. Lavington Sharan Vaswani Mark Schmidt OffRL 112 7 0 29 Jul 2022
Active Exploration via Experiment Design in Markov Chains Mojmír Mutný Tadeusz Janik Andreas Krause 109 16 0 29 Jun 2022
Towards Painless Policy Optimization for Constrained MDPs Arushi Jain Sharan Vaswani Reza Babanezhad Csaba Szepesvári Doina Precup 93 7 0 11 Apr 2022
Your Policy Regularizer is Secretly an Adversary Rob Brekelmans Tim Genewein Jordi Grau-Moya Grégoire Delétang M. Kunesch Shane Legg Pedro A. Ortega AAML 117 16 0 23 Mar 2022
Challenging Common Assumptions in Convex Reinforcement Learning Mirco Mutti Ric De Santi Piersilvio De Bartolomeis Marcello Restelli OffRL 140 23 0 03 Feb 2022
The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs Johannes Muller Guido Montúfar 132 8 0 14 Oct 2021
On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning Guy Tennenholtz Assaf Hallak Gal Dalal Shie Mannor Gal Chechik Uri Shalit OOD OffRL 160 16 0 13 Oct 2021
Concave Utility Reinforcement Learning: the Mean-Field Game Viewpoint Matthieu Geist Julien Pérolat Mathieu Laurière Romuald Elie Sarah Perrin Olivier Bachem Rémi Munos Olivier Pietquin 116 68 0 07 Jun 2021