ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.02647
  4. Cited By
Safe and Efficient Off-Policy Reinforcement Learning
v1v2 (latest)

Safe and Efficient Off-Policy Reinforcement Learning

8 June 2016
Rémi Munos
T. Stepleton
Anna Harutyunyan
Marc G. Bellemare
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Safe and Efficient Off-Policy Reinforcement Learning"

50 / 374 papers shown
Title
Neural probabilistic motor primitives for humanoid control
Neural probabilistic motor primitives for humanoid control
J. Merel
Leonard Hasenclever
Alexandre Galashov
Arun Ahuja
Vu Pham
Greg Wayne
Yee Whye Teh
N. Heess
114
161
0
28 Nov 2018
Experience Replay for Continual Learning
Experience Replay for Continual Learning
David Rolnick
Arun Ahuja
Jonathan Richard Schwarz
Timothy Lillicrap
Greg Wayne
CLL
148
1,178
0
28 Nov 2018
Hierarchical visuomotor control of humanoids
Hierarchical visuomotor control of humanoids
J. Merel
Arun Ahuja
Vu Pham
S. Tunyasuvunakool
Siqi Liu
Dhruva Tirumala
N. Heess
Greg Wayne
115
97
0
23 Nov 2018
The Barbados 2018 List of Open Issues in Continual Learning
The Barbados 2018 List of Open Issues in Continual Learning
Tom Schaul
H. V. Hasselt
Joseph Modayil
Martha White
Adam White
Pierre-Luc Bacon
J. Harb
Shibl Mourad
Marc G. Bellemare
Doina Precup
LM&Ro3DV
69
10
0
16 Nov 2018
Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search
Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search
Lars Buesing
T. Weber
Yori Zwols
S. Racanière
A. Guez
Jean-Baptiste Lespiau
N. Heess
CML
121
138
0
15 Nov 2018
Importance Weighted Evolution Strategies
Importance Weighted Evolution Strategies
Victor Campos
Xavier Giró-i-Nieto
Jordi Torres
41
1
0
12 Nov 2018
Plan Online, Learn Offline: Efficient Learning and Exploration via
  Model-Based Control
Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control
Kendall Lowrey
Aravind Rajeswaran
Sham Kakade
G. Haro
Igor Mordatch
OffRL
79
229
0
05 Nov 2018
VIREL: A Variational Inference Framework for Reinforcement Learning
VIREL: A Variational Inference Framework for Reinforcement Learning
M. Fellows
Anuj Mahajan
Tim G. J. Rudner
Shimon Whiteson
DRL
106
56
0
03 Nov 2018
Relative Importance Sampling For Off-Policy Actor-Critic in Deep
  Reinforcement Learning
Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning
Mahammad Humayoo
Xueqi Cheng
BDLOffRL
45
5
0
30 Oct 2018
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Qiang Liu
Lihong Li
Ziyang Tang
Dengyong Zhou
OffRL
194
357
0
29 Oct 2018
Reconciling $λ$-Returns with Experience Replay
Reconciling λλλ-Returns with Experience Replay
Brett Daley
Chris Amato
59
4
0
23 Oct 2018
Successor Uncertainties: Exploration and Uncertainty in Temporal
  Difference Learning
Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning
David Janz
Jiri Hron
Przemysław Mazur
Katja Hofmann
José Miguel Hernández-Lobato
Sebastian Tschiatschek
163
52
0
15 Oct 2018
Deep Reinforcement Learning
Deep Reinforcement Learning
Yuxi Li
VLMOffRL
194
144
0
15 Oct 2018
Verification for Machine Learning, Autonomy, and Neural Networks Survey
Verification for Machine Learning, Autonomy, and Neural Networks Survey
Weiming Xiang
Patrick Musau
A. Wild
Diego Manzanas Lopez
Nathaniel P. Hamilton
Xiaodong Yang
Joel A. Rosenfeld
Taylor T. Johnson
93
102
0
03 Oct 2018
Efficient Dialog Policy Learning via Positive Memory Retention
Efficient Dialog Policy Learning via Positive Memory Retention
Rui Zhao
Volker Tresp
95
10
0
02 Oct 2018
Policy Optimization via Importance Sampling
Policy Optimization via Importance Sampling
Alberto Maria Metelli
Matteo Papini
Francesco Faccio
Marcello Restelli
OffRL
99
90
0
17 Sep 2018
Efficient Counterfactual Learning from Bandit Feedback
Efficient Counterfactual Learning from Bandit Feedback
Yusuke Narita
Shota Yasui
Kohei Yata
OffRL
105
49
0
10 Sep 2018
Remember and Forget for Experience Replay
Remember and Forget for Experience Replay
G. Novati
Petros Koumoutsakos
OffRL
108
92
0
16 Jul 2018
Deep-Reinforcement-Learning for Gliding and Perching Bodies
Deep-Reinforcement-Learning for Gliding and Perching Bodies
G. Novati
L. Mahadevan
Petros Koumoutsakos
77
9
0
07 Jul 2018
Per-decision Multi-step Temporal Difference Learning with Control
  Variates
Per-decision Multi-step Temporal Difference Learning with Control Variates
Kristopher De Asis
R. Sutton
80
7
0
05 Jul 2018
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value
  Expansion
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
Jacob Buckman
Danijar Hafner
George Tucker
E. Brevdo
Honglak Lee
97
333
0
04 Jul 2018
RUDDER: Return Decomposition for Delayed Rewards
RUDDER: Return Decomposition for Delayed Rewards
Jose A. Arjona-Medina
Michael Gillhofer
Michael Widrich
Thomas Unterthiner
Johannes Brandstetter
Sepp Hochreiter
130
222
0
20 Jun 2018
Self-Imitation Learning
Self-Imitation Learning
Junhyuk Oh
Yijie Guo
Satinder Singh
Honglak Lee
SSL
85
251
0
14 Jun 2018
Maximum a Posteriori Policy Optimisation
Maximum a Posteriori Policy Optimisation
A. Abdolmaleki
Jost Tobias Springenberg
Yuval Tassa
Rémi Munos
N. Heess
Martin Riedmiller
87
478
0
14 Jun 2018
Qualitative Measurements of Policy Discrepancy for Return-Based Deep
  Q-Network
Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network
Wenjia Meng
Qian Zheng
L. Yang
Pengfei Li
Gang Pan
45
21
0
14 Jun 2018
Importance Sampling Policy Evaluation with an Estimated Behavior Policy
Importance Sampling Policy Evaluation with an Estimated Behavior Policy
Josiah P. Hanna
S. Niekum
Peter Stone
OffRL
61
68
0
04 Jun 2018
Sample-Efficient Deep Reinforcement Learning via Episodic Backward
  Update
Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update
Su Young Lee
Sung-Ik Choi
Sae-Young Chung
BDL
77
75
0
31 May 2018
Meta-Gradient Reinforcement Learning
Meta-Gradient Reinforcement Learning
Zhongwen Xu
H. V. Hasselt
David Silver
117
327
0
24 May 2018
Deep Reinforcement Learning For Sequence to Sequence Models
Deep Reinforcement Learning For Sequence to Sequence Models
Yaser Keneshloo
Tian Shi
Naren Ramakrishnan
Chandan K. Reddy
AIMat3DVOffRL
92
211
0
24 May 2018
Data-Efficient Hierarchical Reinforcement Learning
Data-Efficient Hierarchical Reinforcement Learning
Ofir Nachum
S. Gu
Honglak Lee
Sergey Levine
OffRL
102
814
0
21 May 2018
Constrained Policy Improvement for Safe and Efficient Reinforcement
  Learning
Constrained Policy Improvement for Safe and Efficient Reinforcement Learning
Elad Sarafian
Aviv Tamar
Sarit Kraus
OffRL
60
11
0
20 May 2018
Episodic Memory Deep Q-Networks
Episodic Memory Deep Q-Networks
Zichuan Lin
Tianqi Zhao
Guangwen Yang
Lintao Zhang
OffRL
61
87
0
19 May 2018
Smoothed Action Value Functions for Learning Gaussian Policies
Smoothed Action Value Functions for Learning Gaussian Policies
Ofir Nachum
Mohammad Norouzi
George Tucker
Dale Schuurmans
88
28
0
06 Mar 2018
Learning by Playing - Solving Sparse Reward Tasks from Scratch
Learning by Playing - Solving Sparse Reward Tasks from Scratch
Martin Riedmiller
Roland Hafner
Thomas Lampe
Michael Neunert
Jonas Degrave
T. Wiele
Volodymyr Mnih
N. Heess
Jost Tobias Springenberg
113
450
0
28 Feb 2018
Addressing Function Approximation Error in Actor-Critic Methods
Addressing Function Approximation Error in Actor-Critic Methods
Scott Fujimoto
H. V. Hoof
David Meger
OffRL
338
5,244
0
26 Feb 2018
Multi-Goal Reinforcement Learning: Challenging Robotics Environments and
  Request for Research
Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research
Matthias Plappert
Marcin Andrychowicz
Alex Ray
Bob McGrew
Bowen Baker
...
Joshua Tobin
Maciek Chociej
Peter Welinder
Vikash Kumar
Wojciech Zaremba
75
573
0
26 Feb 2018
Fully Decentralized Multi-Agent Reinforcement Learning with Networked
  Agents
Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents
Kai Zhang
Zhuoran Yang
Han Liu
Tong Zhang
Tamer Basar
150
592
0
23 Feb 2018
Unicorn: Continual Learning with a Universal, Off-policy Agent
Unicorn: Continual Learning with a Universal, Off-policy Agent
D. Mankowitz
Augustin Žídek
André Barreto
Dan Horgan
Matteo Hessel
John Quan
Junhyuk Oh
H. V. Hasselt
David Silver
Tom Schaul
CLLOffRL
70
48
0
22 Feb 2018
Reinforcement Learning from Imperfect Demonstrations
Reinforcement Learning from Imperfect Demonstrations
Yang Gao
Huazhe Xu
Ji Lin
Feng Yu
Sergey Levine
Trevor Darrell
84
202
0
14 Feb 2018
Sample Efficient Deep Reinforcement Learning for Dialogue Systems with
  Large Action Spaces
Sample Efficient Deep Reinforcement Learning for Dialogue Systems with Large Action Spaces
Gellert Weisz
Paweł Budzianowski
Pei-hao Su
Milica Gasic
47
83
0
11 Feb 2018
More Robust Doubly Robust Off-policy Evaluation
More Robust Doubly Robust Off-policy Evaluation
Mehrdad Farajtabar
Yinlam Chow
Mohammad Ghavamzadeh
OffRL
93
270
0
10 Feb 2018
A Unified Approach for Multi-step Temporal-Difference Learning with
  Eligibility Traces in Reinforcement Learning
A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning
Long Yang
Minhao Shi
Qian Zheng
Wenjia Meng
Gang Pan
78
24
0
09 Feb 2018
IMPALA: Scalable Distributed Deep-RL with Importance Weighted
  Actor-Learner Architectures
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
L. Espeholt
Hubert Soyer
Rémi Munos
Karen Simonyan
Volodymyr Mnih
...
Vlad Firoiu
Tim Harley
Iain Dunning
Shane Legg
Koray Kavukcuoglu
276
1,609
0
05 Feb 2018
Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With
  Expert Demonstrations
Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations
Xiaoqin Zhang
Huimin Ma
OffRL
115
38
0
31 Jan 2018
Faster Deep Q-learning using Neural Episodic Control
Faster Deep Q-learning using Neural Episodic Control
Daichi Nishio
S. Yamane
OffRL
82
10
0
06 Jan 2018
On Convergence of some Gradient-based Temporal-Differences Algorithms
  for Off-Policy Learning
On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning
Huizhen Yu
OffRL
103
32
0
27 Dec 2017
A Flexible Approach to Automated RNN Architecture Generation
A Flexible Approach to Automated RNN Architecture Generation
Martin Schrimpf
Stephen Merity
James Bradbury
R. Socher
59
16
0
20 Dec 2017
End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy
  Gradient
End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient
Li Zhou
Kevin Small
Oleg Rokhlenko
Charles Elkan
OffRL
76
42
0
07 Dec 2017
Classification with Costly Features using Deep Reinforcement Learning
Classification with Costly Features using Deep Reinforcement Learning
Jaromír Janisch
Tomás Pevný
Viliam Lisý
OffRL
74
98
0
20 Nov 2017
Hindsight policy gradients
Hindsight policy gradients
Paulo E. Rauber
Avinash Ummadisingu
Filipe Wall Mutz
J. Schmidhuber
80
68
0
16 Nov 2017
Previous
12345678
Next