Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1606.02647
Cited By
v1
v2 (latest)
Safe and Efficient Off-Policy Reinforcement Learning
8 June 2016
Rémi Munos
T. Stepleton
Anna Harutyunyan
Marc G. Bellemare
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Safe and Efficient Off-Policy Reinforcement Learning"
50 / 374 papers shown
Title
On the Optimality, Stability, and Feasibility of Control Barrier Functions: An Adaptive Learning-Based Approach
A. Chriat
Chuangchuang Sun
81
4
0
05 May 2023
Posterior Sampling for Deep Reinforcement Learning
Remo Sasso
Michelangelo Conserva
Paulo E. Rauber
OffRL
BDL
58
7
0
30 Apr 2023
Multi-Task Reinforcement Learning in Continuous Control with Successor Feature-Based Concurrent Composition
Y. Liu
Aamir Ahmad
77
4
0
24 Mar 2023
Uncertainty-Aware Instance Reweighting for Off-Policy Learning
Xiaoying Zhang
Junpu Chen
Hongning Wang
Hong Xie
Yang Liu
John C. S. Lui
Hang Li
OffRL
145
4
0
11 Mar 2023
Mastering Strategy Card Game (Legends of Code and Magic) via End-to-End Policy and Optimistic Smooth Fictitious Play
Wei Xi
Yongxin Zhang
Changnan Xiao
Xuefeng Huang
Shihong Deng
Haowei Liang
Jie Chen
Peng Sun
OffRL
67
8
0
07 Mar 2023
Diminishing Return of Value Expansion Methods in Model-Based Reinforcement Learning
Daniel Palenicek
M. Lutter
João Carvalho
Jan Peters
79
4
0
07 Mar 2023
Sequential Counterfactual Risk Minimization
Houssam Zenati
Eustache Diemert
Matthieu Martin
Julien Mairal
Pierre Gaillard
OffRL
74
3
0
23 Feb 2023
Efficient Exploration via Epistemic-Risk-Seeking Policy Optimization
Brendan O'Donoghue
OffRL
96
7
0
18 Feb 2023
HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare
Ge Gao
Song Ju
Markel Sanz Ausin
Min Chi
OffRL
82
8
0
18 Feb 2023
Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design
Shuze Liu
Shangtong Zhang
OffRL
99
5
0
31 Jan 2023
Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning
Brett Daley
Martha White
Chris Amato
Marlos C. Machado
OffRL
138
3
0
26 Jan 2023
Off-Policy Evaluation for Action-Dependent Non-Stationary Environments
Yash Chandak
Shiv Shankar
Nathaniel D. Bastian
Bruno Castro da Silva
Emma Brunskil
Philip S. Thomas
OffRL
92
6
0
24 Jan 2023
Human-Timescale Adaptation in an Open-Ended Task Space
Adaptive Agent Team
Jakob Bauer
Kate Baumli
Satinder Baveja
Feryal M. P. Behbahani
...
Jakub Sygnowski
K. Tuyls
Sarah York
Alexander Zacherl
Lei Zhang
LM&Ro
OffRL
AI4CE
LRM
139
119
0
18 Jan 2023
Safe Reinforcement Learning for an Energy-Efficient Driver Assistance System
Habtamu Hailemichael
B. Ayalew
Lindsey Kerbel
Andrej Ivanco
K. Loiselle
115
4
0
03 Jan 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
189
36
0
01 Jan 2023
Control of Continuous Quantum Systems with Many Degrees of Freedom based on Convergent Reinforcement Learning
Zhikang T. Wang
48
0
0
21 Dec 2022
Residual Policy Learning for Powertrain Control
Lindsey Kerbel
B. Ayalew
Andrej Ivanco
K. Loiselle
31
4
0
15 Dec 2022
Driver Assistance Eco-driving and Transmission Control with Deep Reinforcement Learning
Lindsey Kerbel
B. Ayalew
Andrej Ivanco
K. Loiselle
OffRL
36
8
0
15 Dec 2022
Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees
Hsin-En Su
Yen-Ju Chen
Ping-Chun Hsieh
Xi Liu
OffRL
70
0
0
10 Dec 2022
Design and Planning of Flexible Mobile Micro-Grids Using Deep Reinforcement Learning
Cesare Caputo
Michel-Alexandre Cardin
Pudong Ge
Fei Teng
A. Korre
Ehecatl Antonio del Rio Chanona
39
18
0
08 Dec 2022
AcceRL: Policy Acceleration Framework for Deep Reinforcement Learning
Hongjie Zhang
OffRL
38
0
0
28 Nov 2022
General Intelligence Requires Rethinking Exploration
Minqi Jiang
Tim Rocktaschel
Edward Grefenstette
LRM
79
20
0
15 Nov 2022
Knowing the Past to Predict the Future: Reinforcement Virtual Learning
Peng Zhang
Yawen Huang
Bingzhang Hu
Shizheng Wang
Haoran Duan
Noura Al Moubayed
Yefeng Zheng
Yang Long
OffRL
60
0
0
02 Nov 2022
AACHER: Assorted Actor-Critic Deep Reinforcement Learning with Hindsight Experience Replay
Adarsh Sehgal
Muskan Sehgal
Hung M. La
31
2
0
24 Oct 2022
Elastic Step DQN: A novel multi-step algorithm to alleviate overestimation in Deep QNetworks
Adrian Ly
Richard Dazeley
Peter Vamplew
Francisco Cruz
Sunil Aryal
107
13
0
07 Oct 2022
Hierarchical reinforcement learning for in-hand robotic manipulation using Davenport chained rotations
Francisco Roldan Sanchez
Qiang-qiang Wang
David Córdova Bulens
Kevin McGuinness
Stephen J. Redmond
Noel E. O'Connor
40
1
0
03 Oct 2022
Reward Shaping for User Satisfaction in a REINFORCE Recommender
Konstantina Christakopoulou
Can Xu
Sai Zhang
Sriraj Badam
Trevor Potter
...
Ya Le
Chris Berg
E. B. Dixon
Ed H. Chi
Minmin Chen
OffRL
32
9
0
30 Sep 2022
Reinforcement Learning Algorithms: An Overview and Classification
Fadi AlMahamid
Katarina Grolinger
37
45
0
29 Sep 2022
Opportunities and Challenges from Using Animal Videos in Reinforcement Learning for Navigation
Vittorio Giammarino
James Queeney
Lucas C. Carstensen
Michael Hasselmo
I. Paschalidis
OffRL
78
5
0
25 Sep 2022
Human-level Atari 200x faster
Steven Kapturowski
Victor Campos
Ray Jiang
Nemanja Rakićević
Hado van Hasselt
Charles Blundell
Adria Puigdomenech Badia
OffRL
91
30
0
15 Sep 2022
Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL
Taku Yamagata
Ahmed Khalil
Raúl Santos-Rodríguez
OffRL
229
81
0
08 Sep 2022
Autonomous Unmanned Aerial Vehicle Navigation using Reinforcement Learning: A Systematic Review
Fadi AlMahamid
Katarina Grolinger
61
76
0
25 Aug 2022
An Approximate Policy Iteration Viewpoint of Actor-Critic Algorithms
Zaiwei Chen
S. T. Maguluri
80
1
0
05 Aug 2022
Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach
Baturay Saglam
Dogan C. Cicek
Furkan B. Mutlu
Suleyman S. Kozat
OffRL
OnRL
85
1
0
01 Aug 2022
Safe and Robust Experience Sharing for Deterministic Policy Gradient Algorithms
Baturay Saglam
Dogan C. Cicek
Furkan B. Mutlu
Suleyman S. Kozat
OffRL
52
3
0
27 Jul 2022
The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
Yunhao Tang
Mark Rowland
Rémi Munos
Bernardo Avila-Pires
Will Dabney
Marc G. Bellemare
OffRL
67
12
0
15 Jul 2022
Value Function Decomposition for Iterative Design of Reinforcement Learning Agents
J. MacGlashan
Evan Archer
A. Devlic
Takuma Seno
Craig Sherstan
Peter R. Wurman
AI PeterStoneSony
47
6
0
24 Jun 2022
Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination
Jiafei Lyu
Xiu Li
Zongqing Lu
OffRL
85
26
0
16 Jun 2022
Safe-FinRL: A Low Bias and Variance Deep Reinforcement Learning Implementation for High-Freq Stock Trading
Zitao Song
Xuyang Jin
Chenliang Li
OffRL
AIFin
48
1
0
13 Jun 2022
Reinforcement Learning for Vision-based Object Manipulation with Non-parametric Policy and Action Primitives
Dongwon Son
Myungsin Kim
Jaecheol Sim
Wonsik Shin
46
1
0
12 Jun 2022
Graph Backup: Data Efficient Backup Exploiting Markovian Transitions
Zhengyao Jiang
Tianjun Zhang
Robert Kirk
Tim Rocktaschel
Edward Grefenstette
OffRL
36
2
0
31 May 2022
RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch
Y. Tan
Pihe Hu
L. Pan
Jiatai Huang
Longbo Huang
OffRL
69
24
0
30 May 2022
On the Robustness of Safe Reinforcement Learning under Observational Perturbations
Zuxin Liu
Zijian Guo
Zhepeng Cen
Huan Zhang
Jie Tan
Yue Liu
Ding Zhao
OOD
OffRL
100
37
0
29 May 2022
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy Measure
Xing Chen
Dongcui Diao
Hechang Chen
Hengshuai Yao
Haiyin Piao
Zhixiao Sun
Zhiwei Yang
Randy Goebel
Bei Jiang
Yi-Ju Chang
OffRL
142
9
0
20 May 2022
Towards biologically plausible Dreaming and Planning in recurrent spiking networks
C. Capone
P. Paolucci
CLL
33
7
0
20 May 2022
Automatic Parameter Optimization Using Genetic Algorithm in Deep Reinforcement Learning for Robotic Manipulation Tasks
Adarsh Sehgal
Nicholas Ward
Hung M. La
S. Louis
45
1
0
07 Apr 2022
Knowledge Infused Decoding
Ruibo Liu
Guoqing Zheng
Shashank Gupta
Radhika Gaonkar
Chongyang Gao
Soroush Vosoughi
Milad Shokouhi
Ahmed Hassan Awadallah
KELM
85
14
0
06 Apr 2022
Marginalized Operators for Off-policy Reinforcement Learning
Yunhao Tang
Mark Rowland
Rémi Munos
Michal Valko
OffRL
61
0
0
30 Mar 2022
Remember and Forget Experience Replay for Multi-Agent Reinforcement Learning
Pascal Weber
Daniel Wälchli
Mustafa Zeqiri
Petros Koumoutsakos
CLL
OffRL
74
7
0
24 Mar 2022
Robust Action Gap Increasing with Clipped Advantage Learning
Zhe Zhang
Yaozhong Gan
Xiaoyang Tan
50
2
0
20 Mar 2022
Previous
1
2
3
4
5
6
7
8
Next