ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.02647
  4. Cited By
Safe and Efficient Off-Policy Reinforcement Learning
v1v2 (latest)

Safe and Efficient Off-Policy Reinforcement Learning

8 June 2016
Rémi Munos
T. Stepleton
Anna Harutyunyan
Marc G. Bellemare
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Safe and Efficient Off-Policy Reinforcement Learning"

50 / 374 papers shown
Title
On the Optimality, Stability, and Feasibility of Control Barrier
  Functions: An Adaptive Learning-Based Approach
On the Optimality, Stability, and Feasibility of Control Barrier Functions: An Adaptive Learning-Based Approach
A. Chriat
Chuangchuang Sun
81
4
0
05 May 2023
Posterior Sampling for Deep Reinforcement Learning
Posterior Sampling for Deep Reinforcement Learning
Remo Sasso
Michelangelo Conserva
Paulo E. Rauber
OffRLBDL
58
7
0
30 Apr 2023
Multi-Task Reinforcement Learning in Continuous Control with Successor
  Feature-Based Concurrent Composition
Multi-Task Reinforcement Learning in Continuous Control with Successor Feature-Based Concurrent Composition
Y. Liu
Aamir Ahmad
77
4
0
24 Mar 2023
Uncertainty-Aware Instance Reweighting for Off-Policy Learning
Uncertainty-Aware Instance Reweighting for Off-Policy Learning
Xiaoying Zhang
Junpu Chen
Hongning Wang
Hong Xie
Yang Liu
John C. S. Lui
Hang Li
OffRL
145
4
0
11 Mar 2023
Mastering Strategy Card Game (Legends of Code and Magic) via End-to-End
  Policy and Optimistic Smooth Fictitious Play
Mastering Strategy Card Game (Legends of Code and Magic) via End-to-End Policy and Optimistic Smooth Fictitious Play
Wei Xi
Yongxin Zhang
Changnan Xiao
Xuefeng Huang
Shihong Deng
Haowei Liang
Jie Chen
Peng Sun
OffRL
67
8
0
07 Mar 2023
Diminishing Return of Value Expansion Methods in Model-Based
  Reinforcement Learning
Diminishing Return of Value Expansion Methods in Model-Based Reinforcement Learning
Daniel Palenicek
M. Lutter
João Carvalho
Jan Peters
79
4
0
07 Mar 2023
Sequential Counterfactual Risk Minimization
Sequential Counterfactual Risk Minimization
Houssam Zenati
Eustache Diemert
Matthieu Martin
Julien Mairal
Pierre Gaillard
OffRL
74
3
0
23 Feb 2023
Efficient Exploration via Epistemic-Risk-Seeking Policy Optimization
Efficient Exploration via Epistemic-Risk-Seeking Policy Optimization
Brendan O'Donoghue
OffRL
96
7
0
18 Feb 2023
HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare
HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare
Ge Gao
Song Ju
Markel Sanz Ausin
Min Chi
OffRL
82
8
0
18 Feb 2023
Efficient Policy Evaluation with Offline Data Informed Behavior Policy
  Design
Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design
Shuze Liu
Shangtong Zhang
OffRL
99
5
0
31 Jan 2023
Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement
  Learning
Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning
Brett Daley
Martha White
Chris Amato
Marlos C. Machado
OffRL
138
3
0
26 Jan 2023
Off-Policy Evaluation for Action-Dependent Non-Stationary Environments
Off-Policy Evaluation for Action-Dependent Non-Stationary Environments
Yash Chandak
Shiv Shankar
Nathaniel D. Bastian
Bruno Castro da Silva
Emma Brunskil
Philip S. Thomas
OffRL
92
6
0
24 Jan 2023
Human-Timescale Adaptation in an Open-Ended Task Space
Human-Timescale Adaptation in an Open-Ended Task Space
Adaptive Agent Team
Jakob Bauer
Kate Baumli
Satinder Baveja
Feryal M. P. Behbahani
...
Jakub Sygnowski
K. Tuyls
Sarah York
Alexander Zacherl
Lei Zhang
LM&RoOffRLAI4CELRM
139
119
0
18 Jan 2023
Safe Reinforcement Learning for an Energy-Efficient Driver Assistance
  System
Safe Reinforcement Learning for an Energy-Efficient Driver Assistance System
Habtamu Hailemichael
B. Ayalew
Lindsey Kerbel
Andrej Ivanco
K. Loiselle
115
4
0
03 Jan 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from
  Text Edits
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
189
36
0
01 Jan 2023
Control of Continuous Quantum Systems with Many Degrees of Freedom based
  on Convergent Reinforcement Learning
Control of Continuous Quantum Systems with Many Degrees of Freedom based on Convergent Reinforcement Learning
Zhikang T. Wang
48
0
0
21 Dec 2022
Residual Policy Learning for Powertrain Control
Residual Policy Learning for Powertrain Control
Lindsey Kerbel
B. Ayalew
Andrej Ivanco
K. Loiselle
31
4
0
15 Dec 2022
Driver Assistance Eco-driving and Transmission Control with Deep
  Reinforcement Learning
Driver Assistance Eco-driving and Transmission Control with Deep Reinforcement Learning
Lindsey Kerbel
B. Ayalew
Andrej Ivanco
K. Loiselle
OffRL
36
8
0
15 Dec 2022
Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees
Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees
Hsin-En Su
Yen-Ju Chen
Ping-Chun Hsieh
Xi Liu
OffRL
70
0
0
10 Dec 2022
Design and Planning of Flexible Mobile Micro-Grids Using Deep
  Reinforcement Learning
Design and Planning of Flexible Mobile Micro-Grids Using Deep Reinforcement Learning
Cesare Caputo
Michel-Alexandre Cardin
Pudong Ge
Fei Teng
A. Korre
Ehecatl Antonio del Rio Chanona
39
18
0
08 Dec 2022
AcceRL: Policy Acceleration Framework for Deep Reinforcement Learning
AcceRL: Policy Acceleration Framework for Deep Reinforcement Learning
Hongjie Zhang
OffRL
38
0
0
28 Nov 2022
General Intelligence Requires Rethinking Exploration
General Intelligence Requires Rethinking Exploration
Minqi Jiang
Tim Rocktaschel
Edward Grefenstette
LRM
79
20
0
15 Nov 2022
Knowing the Past to Predict the Future: Reinforcement Virtual Learning
Knowing the Past to Predict the Future: Reinforcement Virtual Learning
Peng Zhang
Yawen Huang
Bingzhang Hu
Shizheng Wang
Haoran Duan
Noura Al Moubayed
Yefeng Zheng
Yang Long
OffRL
60
0
0
02 Nov 2022
AACHER: Assorted Actor-Critic Deep Reinforcement Learning with Hindsight
  Experience Replay
AACHER: Assorted Actor-Critic Deep Reinforcement Learning with Hindsight Experience Replay
Adarsh Sehgal
Muskan Sehgal
Hung M. La
31
2
0
24 Oct 2022
Elastic Step DQN: A novel multi-step algorithm to alleviate
  overestimation in Deep QNetworks
Elastic Step DQN: A novel multi-step algorithm to alleviate overestimation in Deep QNetworks
Adrian Ly
Richard Dazeley
Peter Vamplew
Francisco Cruz
Sunil Aryal
107
13
0
07 Oct 2022
Hierarchical reinforcement learning for in-hand robotic manipulation
  using Davenport chained rotations
Hierarchical reinforcement learning for in-hand robotic manipulation using Davenport chained rotations
Francisco Roldan Sanchez
Qiang-qiang Wang
David Córdova Bulens
Kevin McGuinness
Stephen J. Redmond
Noel E. O'Connor
40
1
0
03 Oct 2022
Reward Shaping for User Satisfaction in a REINFORCE Recommender
Reward Shaping for User Satisfaction in a REINFORCE Recommender
Konstantina Christakopoulou
Can Xu
Sai Zhang
Sriraj Badam
Trevor Potter
...
Ya Le
Chris Berg
E. B. Dixon
Ed H. Chi
Minmin Chen
OffRL
32
9
0
30 Sep 2022
Reinforcement Learning Algorithms: An Overview and Classification
Reinforcement Learning Algorithms: An Overview and Classification
Fadi AlMahamid
Katarina Grolinger
37
45
0
29 Sep 2022
Opportunities and Challenges from Using Animal Videos in Reinforcement
  Learning for Navigation
Opportunities and Challenges from Using Animal Videos in Reinforcement Learning for Navigation
Vittorio Giammarino
James Queeney
Lucas C. Carstensen
Michael Hasselmo
I. Paschalidis
OffRL
78
5
0
25 Sep 2022
Human-level Atari 200x faster
Human-level Atari 200x faster
Steven Kapturowski
Victor Campos
Ray Jiang
Nemanja Rakićević
Hado van Hasselt
Charles Blundell
Adria Puigdomenech Badia
OffRL
91
30
0
15 Sep 2022
Q-learning Decision Transformer: Leveraging Dynamic Programming for
  Conditional Sequence Modelling in Offline RL
Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL
Taku Yamagata
Ahmed Khalil
Raúl Santos-Rodríguez
OffRL
229
81
0
08 Sep 2022
Autonomous Unmanned Aerial Vehicle Navigation using Reinforcement
  Learning: A Systematic Review
Autonomous Unmanned Aerial Vehicle Navigation using Reinforcement Learning: A Systematic Review
Fadi AlMahamid
Katarina Grolinger
61
76
0
25 Aug 2022
An Approximate Policy Iteration Viewpoint of Actor-Critic Algorithms
An Approximate Policy Iteration Viewpoint of Actor-Critic Algorithms
Zaiwei Chen
S. T. Maguluri
80
1
0
05 Aug 2022
Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step
  Q-learning: A Novel Correction Approach
Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach
Baturay Saglam
Dogan C. Cicek
Furkan B. Mutlu
Suleyman S. Kozat
OffRLOnRL
85
1
0
01 Aug 2022
Safe and Robust Experience Sharing for Deterministic Policy Gradient
  Algorithms
Safe and Robust Experience Sharing for Deterministic Policy Gradient Algorithms
Baturay Saglam
Dogan C. Cicek
Furkan B. Mutlu
Suleyman S. Kozat
OffRL
52
3
0
27 Jul 2022
The Nature of Temporal Difference Errors in Multi-step Distributional
  Reinforcement Learning
The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
Yunhao Tang
Mark Rowland
Rémi Munos
Bernardo Avila-Pires
Will Dabney
Marc G. Bellemare
OffRL
67
12
0
15 Jul 2022
Value Function Decomposition for Iterative Design of Reinforcement
  Learning Agents
Value Function Decomposition for Iterative Design of Reinforcement Learning Agents
J. MacGlashan
Evan Archer
A. Devlic
Takuma Seno
Craig Sherstan
Peter R. Wurman
AI PeterStoneSony
47
6
0
24 Jun 2022
Double Check Your State Before Trusting It: Confidence-Aware
  Bidirectional Offline Model-Based Imagination
Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination
Jiafei Lyu
Xiu Li
Zongqing Lu
OffRL
85
26
0
16 Jun 2022
Safe-FinRL: A Low Bias and Variance Deep Reinforcement Learning
  Implementation for High-Freq Stock Trading
Safe-FinRL: A Low Bias and Variance Deep Reinforcement Learning Implementation for High-Freq Stock Trading
Zitao Song
Xuyang Jin
Chenliang Li
OffRLAIFin
48
1
0
13 Jun 2022
Reinforcement Learning for Vision-based Object Manipulation with
  Non-parametric Policy and Action Primitives
Reinforcement Learning for Vision-based Object Manipulation with Non-parametric Policy and Action Primitives
Dongwon Son
Myungsin Kim
Jaecheol Sim
Wonsik Shin
46
1
0
12 Jun 2022
Graph Backup: Data Efficient Backup Exploiting Markovian Transitions
Graph Backup: Data Efficient Backup Exploiting Markovian Transitions
Zhengyao Jiang
Tianjun Zhang
Robert Kirk
Tim Rocktaschel
Edward Grefenstette
OffRL
36
2
0
31 May 2022
RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch
RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch
Y. Tan
Pihe Hu
L. Pan
Jiatai Huang
Longbo Huang
OffRL
69
24
0
30 May 2022
On the Robustness of Safe Reinforcement Learning under Observational
  Perturbations
On the Robustness of Safe Reinforcement Learning under Observational Perturbations
Zuxin Liu
Zijian Guo
Zhepeng Cen
Huan Zhang
Jie Tan
Yue Liu
Ding Zhao
OODOffRL
100
37
0
29 May 2022
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still
  Insufficient according to an Off-Policy Measure
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy Measure
Xing Chen
Dongcui Diao
Hechang Chen
Hengshuai Yao
Haiyin Piao
Zhixiao Sun
Zhiwei Yang
Randy Goebel
Bei Jiang
Yi-Ju Chang
OffRL
142
9
0
20 May 2022
Towards biologically plausible Dreaming and Planning in recurrent
  spiking networks
Towards biologically plausible Dreaming and Planning in recurrent spiking networks
C. Capone
P. Paolucci
CLL
33
7
0
20 May 2022
Automatic Parameter Optimization Using Genetic Algorithm in Deep
  Reinforcement Learning for Robotic Manipulation Tasks
Automatic Parameter Optimization Using Genetic Algorithm in Deep Reinforcement Learning for Robotic Manipulation Tasks
Adarsh Sehgal
Nicholas Ward
Hung M. La
S. Louis
45
1
0
07 Apr 2022
Knowledge Infused Decoding
Knowledge Infused Decoding
Ruibo Liu
Guoqing Zheng
Shashank Gupta
Radhika Gaonkar
Chongyang Gao
Soroush Vosoughi
Milad Shokouhi
Ahmed Hassan Awadallah
KELM
85
14
0
06 Apr 2022
Marginalized Operators for Off-policy Reinforcement Learning
Marginalized Operators for Off-policy Reinforcement Learning
Yunhao Tang
Mark Rowland
Rémi Munos
Michal Valko
OffRL
61
0
0
30 Mar 2022
Remember and Forget Experience Replay for Multi-Agent Reinforcement
  Learning
Remember and Forget Experience Replay for Multi-Agent Reinforcement Learning
Pascal Weber
Daniel Wälchli
Mustafa Zeqiri
Petros Koumoutsakos
CLLOffRL
74
7
0
24 Mar 2022
Robust Action Gap Increasing with Clipped Advantage Learning
Robust Action Gap Increasing with Clipped Advantage Learning
Zhe Zhang
Yaozhong Gan
Xiaoyang Tan
50
2
0
20 Mar 2022
Previous
12345678
Next