Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1606.02647
Cited By
v1
v2 (latest)
Safe and Efficient Off-Policy Reinforcement Learning
8 June 2016
Rémi Munos
T. Stepleton
Anna Harutyunyan
Marc G. Bellemare
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Safe and Efficient Off-Policy Reinforcement Learning"
50 / 374 papers shown
Title
Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods
Tom Danino
Nahum Shimkin
55
0
0
03 Jun 2025
ShiQ: Bringing back Bellman to LLMs
Pierre Clavier
Nathan Grinsztajn
Raphaël Avalos
Yannis Flet-Berliac
Irem Ergun
...
Eugene Tarassov
Olivier Pietquin
Pierre Harvey Richemond
Florian Strub
Matthieu Geist
OffRL
64
0
0
16 May 2025
Automatic Reward Shaping from Confounded Offline Data
Mingxuan Li
Junzhe Zhang
Elias Bareinboim
OffRL
OnRL
108
0
0
16 May 2025
Trust-Region Twisted Policy Improvement
Joery A. de Vries
Jinke He
Yaniv Oren
M. Spaan
OffRL
LRM
133
0
0
08 Apr 2025
Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
Yoav Wald
M. Goldstein
Yonathan Efroni
Wouter A. C. van Amsterdam
Rajesh Ranganath
CML
179
0
0
20 Mar 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
172
6
0
18 Mar 2025
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
Taiyi Wang
Zhihao Wu
Jianheng Liu
Jianye Hao
Jun Wang
Kun Shao
OffRL
122
29
0
24 Feb 2025
Actor Critic with Experience Replay-based automatic treatment planning for prostate cancer intensity modulated radiotherapy
Md Mainul Abrar
Parvat Sapkota
Damon Sprouts
Xun Jia
Yujie Chi
OffRL
63
0
0
01 Feb 2025
Divergence-Augmented Policy Optimization
Qing Wang
Yingru Li
Jiechao Xiong
Tong Zhang
OffRL
174
16
0
28 Jan 2025
GraCo -- A Graph Composer for Integrated Circuits
Stefan Uhlich
Andrea Bonetti
Arun Venkitaraman
Ali Momeni
Ryoga Matsuo
Chia-Yu Hsieh
Eisaku Ohbuchi
Lorenzo Servadei
GNN
155
2
0
21 Nov 2024
A Safety Modulator Actor-Critic Method in Model-Free Safe Reinforcement Learning and Application in UAV Hovering
Qihan Qi
Xinsong Yang
Gang Xia
Daniel W. C. Ho
Pengyang Tang
94
0
0
09 Oct 2024
Compatible Gradient Approximations for Actor-Critic Algorithms
Baturay Saglam
Dionysis Kalogerias
134
0
0
02 Sep 2024
Simplifying Deep Temporal Difference Learning
Matteo Gallici
Mattie Fellows
Benjamin Ellis
B. Pou
Ivan Masmitja
Jakob Foerster
Mario Martin
OffRL
165
26
0
05 Jul 2024
Two-Step Q-Learning
Antony Vijesh
Shreyas Sumithra Rudresha
OffRL
93
1
0
02 Jul 2024
Demystifying the Recency Heuristic in Temporal-Difference Learning
Brett Daley
Marlos C. Machado
Martha White
72
1
0
18 Jun 2024
WPO: Enhancing RLHF with Weighted Preference Optimization
Wenxuan Zhou
Ravi Agrawal
Shujian Zhang
Sathish Indurthi
Sanqiang Zhao
Kaiqiang Song
Silei Xu
Chenguang Zhu
105
20
0
17 Jun 2024
Transcendence: Generative Models Can Outperform The Experts That Train Them
Edwin Zhang
Vincent Zhu
Naomi Saphra
Anat Kleiman
Benjamin L. Edelman
Milind Tambe
Sham Kakade
Eran Malach
120
15
0
17 Jun 2024
Reflective Policy Optimization
Yaozhong Gan
Renye Yan
Zhe Wu
Junliang Xing
84
1
0
06 Jun 2024
Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation
Shangding Gu
Laixi Shi
Yuhao Ding
Alois Knoll
C. Spanos
Adam Wierman
Ming Jin
OffRL
88
2
0
31 May 2024
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
Yu-Juan Luo
Tianying Ji
Gang Hua
Jianwei Zhang
Huazhe Xu
Xianyuan Zhan
OffRL
111
3
0
29 May 2024
Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policies
Haanvid Lee
Tri Wahyu Guntara
Jongmin Lee
Yung-Kyun Noh
Kee-Eung Kim
OffRL
64
1
0
29 May 2024
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
Yu-Juan Luo
Tianying Ji
Gang Hua
Jianwei Zhang
Huazhe Xu
Xianyuan Zhan
OffRL
OnRL
108
3
0
28 May 2024
Highway Reinforcement Learning
Yuhui Wang
M. Strupl
Francesco Faccio
Qingyuan Wu
Haozhe Liu
Michal Grudzieñ
Xiaoyang Tan
Jürgen Schmidhuber
OffRL
73
4
0
28 May 2024
Deep Dive into Model-free Reinforcement Learning for Biological and Robotic Systems: Theory and Practice
Yusheng Jiao
Feng Ling
Sina Heydari
N. Heess
J. Merel
Eva Kanso
64
1
0
19 May 2024
Towards Robust Policy: Enhancing Offline Reinforcement Learning with Adversarial Attacks and Defenses
Thanh Nguyen
Tung M. Luu
Tri Ton
Chang D. Yoo
OffRL
AAML
84
0
0
18 May 2024
Adaptive Exploration for Data-Efficient General Value Function Evaluations
Arushi Jain
Josiah P. Hanna
Doina Precup
59
2
0
13 May 2024
Shared learning of powertrain control policies for vehicle fleets
Lindsey Kerbel
B. Ayalew
Andrej Ivanco
72
1
0
27 Apr 2024
Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning
Dohyeong Kim
Mineui Hong
Jeongho Park
Songhwai Oh
76
0
0
01 Mar 2024
Offline Fictitious Self-Play for Competitive Games
Jingxiao Chen
Weiji Xie
Weinan Zhang
Yong Zu
Ying Wen
OffRL
79
0
0
29 Feb 2024
Skill or Luck? Return Decomposition via Advantage Functions
Hsiao-Ru Pan
Bernhard Schölkopf
OffRL
40
5
0
20 Feb 2024
Discovering Command and Control (C2) Channels on Tor and Public Networks Using Reinforcement Learning
Cheng Wang
Christopher Redino
Abdul Rahman
Ryan Clark
Dan Radke
Tyler Cody
Dhruv Nandakumar
Edward Bowen
59
3
0
14 Feb 2024
Off-policy Distributional Q(
λ
λ
λ
): Distributional RL without Importance Sampling
Yunhao Tang
Mark Rowland
Rémi Munos
Bernardo Avila-Pires
Will Dabney
OffRL
60
1
0
08 Feb 2024
Deep autoregressive density nets vs neural ensembles for model-based offline reinforcement learning
Abdelhakim Benechehab
Albert Thomas
Balázs Kégl
OffRL
67
2
0
05 Feb 2024
Towards Off-Policy Reinforcement Learning for Ranking Policies with Human Feedback
Teng Xiao
Suhang Wang
OffRL
73
8
0
17 Jan 2024
Neural Population Learning beyond Symmetric Zero-sum Games
Siqi Liu
Luke Marris
Marc Lanctot
Georgios Piliouras
Joel Z Leibo
N. Heess
MLT
95
3
0
10 Jan 2024
Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise
Shaan ul Haque
S. Khodadadian
S. T. Maguluri
138
11
0
31 Dec 2023
TAPE: Leveraging Agent Topology for Cooperative Multi-Agent Policy Gradient
Xingzhou Lou
Junge Zhang
Timothy J. Norman
Kaiqi Huang
Yali Du
70
1
0
25 Dec 2023
Probabilistic Offline Policy Ranking with Approximate Bayesian Computation
Longchao Da
Porter Jenkins
Trevor Schwantes
Jeffrey Dotson
Hua Wei
OffRL
54
2
0
17 Dec 2023
Stochastic Optimal Control Matching
Carles Domingo-Enrich
Jiequn Han
Brandon Amos
Joan Bruna
Ricky T. Q. Chen
DiffM
122
10
0
04 Dec 2023
Bias Resilient Multi-Step Off-Policy Goal-Conditioned Reinforcement Learning
Lisheng Wu
Ke Chen
64
0
0
29 Nov 2023
Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning
Ryan Shea
Zhou Yu
OffRL
97
8
0
16 Oct 2023
Distill Knowledge in Multi-task Reinforcement Learning with Optimal-Transport Regularization
Bang Giang Le
Viet-Cuong Ta
OT
81
1
0
27 Sep 2023
Hybrid of representation learning and reinforcement learning for dynamic and complex robotic motion planning
Chengmin Zhou
Xin Lu
Jiapeng Dai
Bingding Huang
Xiaoxu Liu
Pasi Fränti
73
2
0
07 Sep 2023
Counterfactual Explanation Policies in RL
Shripad Deshmukh
R Srivatsan
Supriti Vijay
Jayakumar Subramanian
Chirag Agarwal
OffRL
59
0
0
25 Jul 2023
Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning
Qiang He
Dinesh Manocha
Meng Fang
S. Maghsudi
76
5
0
29 Jun 2023
Value-aware Importance Weighting for Off-policy Reinforcement Learning
Kristopher De Asis
Eric Graves
R. Sutton
OffRL
58
1
0
27 Jun 2023
Bootstrapped Representations in Reinforcement Learning
Charline Le Lan
Stephen Tu
Mark Rowland
Anna Harutyunyan
Rishabh Agarwal
Marc G. Bellemare
Will Dabney
OffRL
OOD
SSL
138
10
0
16 Jun 2023
Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second
Vincent-Pierre Berges
Andrew Szot
Devendra Singh Chaplot
Aaron Gokaslan
Roozbeh Mottaghi
Dhruv Batra
Eric Undersander
LRM
LM&Ro
96
5
0
13 Jun 2023
DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Yunhao Tang
Tadashi Kozuno
Mark Rowland
Anna Harutyunyan
Rémi Munos
Bernardo Avila-Pires
Michal Valko
27
0
0
29 May 2023
Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection
Jiajun Fan
Yuzheng Zhuang
Yuecheng Liu
Jianye Hao
Bin Wang
Jiangcheng Zhu
Hao Wang
Shutao Xia
72
18
0
09 May 2023
1
2
3
4
5
6
7
8
Next