ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.02647
  4. Cited By
Safe and Efficient Off-Policy Reinforcement Learning
v1v2 (latest)

Safe and Efficient Off-Policy Reinforcement Learning

8 June 2016
Rémi Munos
T. Stepleton
Anna Harutyunyan
Marc G. Bellemare
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Safe and Efficient Off-Policy Reinforcement Learning"

50 / 374 papers shown
Title
Dynamics-aware Embeddings
Dynamics-aware Embeddings
William F. Whitney
Rajat Agarwal
Kyunghyun Cho
Abhinav Gupta
SSL
78
53
0
25 Aug 2019
Double Reinforcement Learning for Efficient Off-Policy Evaluation in
  Markov Decision Processes
Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes
Nathan Kallus
Masatoshi Uehara
OffRL
128
187
0
22 Aug 2019
A survey on intrinsic motivation in reinforcement learning
A survey on intrinsic motivation in reinforcement learning
A. Aubret
L. Matignon
S. Hassas
AI4CE
112
144
0
19 Aug 2019
Off-policy Learning for Multiple Loggers
Off-policy Learning for Multiple Loggers
Li He
Long Xia
Wei Zeng
Zhi-Ming Ma
Yue Zhao
Dawei Yin
OffRL
57
10
0
23 Jul 2019
Learning Self-Correctable Policies and Value Functions from
  Demonstrations with Negative Sampling
Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling
Yuping Luo
Huazhe Xu
Tengyu Ma
SSL
78
13
0
12 Jul 2019
Modified Actor-Critics
Modified Actor-Critics
Erinc Merdivan
S. Hanke
Matthieu Geist
43
2
0
02 Jul 2019
Learning the Arrow of Time
Learning the Arrow of Time
Nasim Rahaman
Steffen Wolf
Anirudh Goyal
Roman Remme
Yoshua Bengio
61
5
0
02 Jul 2019
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human
  Preferences in Dialog
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
Natasha Jaques
Asma Ghandeharioun
J. Shen
Craig Ferguson
Àgata Lapedriza
Noah J. Jones
S. Gu
Rosalind W. Picard
OffRL
157
343
0
30 Jun 2019
Learning Policies through Quantile Regression
Learning Policies through Quantile Regression
Oliver Richter
Roger Wattenhofer
44
0
0
27 Jun 2019
Compositional Transfer in Hierarchical Reinforcement Learning
Compositional Transfer in Hierarchical Reinforcement Learning
Markus Wulfmeier
A. Abdolmaleki
Roland Hafner
Jost Tobias Springenberg
Michael Neunert
Tim Hertweck
Thomas Lampe
Noah Y. Siegel
N. Heess
Martin Riedmiller
108
27
0
26 Jun 2019
Ranking Policy Gradient
Ranking Policy Gradient
Kaixiang Lin
Jiayu Zhou
OffRL
67
7
0
24 Jun 2019
Gap-Increasing Policy Evaluation for Efficient and Noise-Tolerant
  Reinforcement Learning
Gap-Increasing Policy Evaluation for Efficient and Noise-Tolerant Reinforcement Learning
Tadashi Kozuno
Dongqi Han
Kenji Doya
OffRL
43
2
0
18 Jun 2019
When to use parametric models in reinforcement learning?
When to use parametric models in reinforcement learning?
H. V. Hasselt
Matteo Hessel
John Aslanides
87
196
0
12 Jun 2019
Importance Resampling for Off-policy Prediction
Importance Resampling for Off-policy Prediction
M. Schlegel
Wesley Chung
Daniel Graves
Jian Qian
Martha White
OffRL
57
41
0
11 Jun 2019
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for
  Reinforcement Learning
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
Nathan Kallus
Masatoshi Uehara
OffRL
99
54
0
09 Jun 2019
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
Aviral Kumar
Justin Fu
George Tucker
Sergey Levine
OffRLOnRL
156
1,068
0
03 Jun 2019
Optimizing Sequential Medical Treatments with Auto-Encoding Heuristic
  Search in POMDPs
Optimizing Sequential Medical Treatments with Auto-Encoding Heuristic Search in POMDPs
Luchen Li
Matthieu Komorowski
Aldo A. Faisal
OffRL
139
13
0
17 May 2019
Meta reinforcement learning as task inference
Meta reinforcement learning as task inference
Jan Humplik
Alexandre Galashov
Leonard Hasenclever
Pedro A. Ortega
Yee Whye Teh
N. Heess
OffRL
120
128
0
15 May 2019
Trajectory-Based Off-Policy Deep Reinforcement Learning
Trajectory-Based Off-Policy Deep Reinforcement Learning
Andreas Doerr
Michael Volpp
Marc Toussaint
Sebastian Trimpe
Christian Daniel
OffRL
61
2
0
14 May 2019
Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient
  Reinforcement Learning
Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning
Seungyul Han
Y. Sung
OffRL
63
20
0
07 May 2019
P3O: Policy-on Policy-off Policy Optimization
P3O: Policy-on Policy-off Policy Optimization
Rasool Fakoor
Pratik Chaudhari
Alex Smola
OffRL
93
56
0
05 May 2019
Information asymmetry in KL-regularized RL
Information asymmetry in KL-regularized RL
Alexandre Galashov
Siddhant M. Jayakumar
Leonard Hasenclever
Dhruva Tirumala
Jonathan Richard Schwarz
Guillaume Desjardins
Wojciech M. Czarnecki
Yee Whye Teh
Razvan Pascanu
N. Heess
OffRL
67
104
0
03 May 2019
Structured agents for physical construction
Structured agents for physical construction
V. Bapst
Alvaro Sanchez-Gonzalez
Carl Doersch
Kimberly L. Stachenfeld
Pushmeet Kohli
Peter W. Battaglia
Jessica B. Hamrick
AI4CE
133
99
0
05 Apr 2019
Multitask Soft Option Learning
Multitask Soft Option Learning
Maximilian Igl
Andrew Gambardella
Jinke He
Nantas Nardelli
N. Siddharth
Wendelin Bohmer
Shimon Whiteson
187
26
0
01 Apr 2019
Meta-Learning surrogate models for sequential decision making
Meta-Learning surrogate models for sequential decision making
Alexandre Galashov
Jonathan Richard Schwarz
Hyunjik Kim
M. Garnelo
D. Saxton
Pushmeet Kohli
S. M. Ali Eslami
Yee Whye Teh
BDLOffRL
95
25
0
28 Mar 2019
Generalized Off-Policy Actor-Critic
Generalized Off-Policy Actor-Critic
Shangtong Zhang
Wendelin Bohmer
Shimon Whiteson
OffRLCML
151
43
0
27 Mar 2019
Exploiting Hierarchy for Learning and Transfer in KL-regularized RL
Exploiting Hierarchy for Learning and Transfer in KL-regularized RL
Dhruva Tirumala
Hyeonwoo Noh
Alexandre Galashov
Leonard Hasenclever
Arun Ahuja
Greg Wayne
Razvan Pascanu
Yee Whye Teh
N. Heess
OffRL
72
44
0
18 Mar 2019
A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed
  Reinforcement Learning
A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning
Wesley A Suttle
Zhuoran Yang
Kai Zhang
Zhaoran Wang
Tamer Basar
Ji Liu
OffRL
84
63
0
15 Mar 2019
Sample-Efficient Model-Free Reinforcement Learning with Off-Policy
  Critics
Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics
Denis Steckelmacher
Hélène Plisnier
D. Roijers
A. Nowé
OffRL
67
17
0
11 Mar 2019
Diagnosing Bottlenecks in Deep Q-learning Algorithms
Diagnosing Bottlenecks in Deep Q-learning Algorithms
Justin Fu
Aviral Kumar
Matthew Soh
Sergey Levine
OffRL
85
142
0
26 Feb 2019
Distributionally Robust Reinforcement Learning
Distributionally Robust Reinforcement Learning
E. Smirnova
Elvis Dohmatob
Jérémie Mary
OffRL
68
60
0
23 Feb 2019
World Discovery Models
World Discovery Models
M. G. Azar
Bilal Piot
Bernardo Avila-Pires
Jean-Bastien Grill
Florent Altché
Rémi Munos
117
26
0
20 Feb 2019
Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs
Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs
Marek Petrik
R. Russel
91
61
0
20 Feb 2019
Emergent Coordination Through Competition
Emergent Coordination Through Competition
Siqi Liu
Guy Lever
J. Merel
S. Tunyasuvunakool
N. Heess
T. Graepel
123
151
0
19 Feb 2019
Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General
  Entropy and Effective Environment Exploration in Deep Reinforcement Learning
Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning
Gang Chen
Yiming Peng
38
8
0
14 Feb 2019
Simultaneously Learning Vision and Feature-based Control Policies for
  Real-world Ball-in-a-Cup
Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup
Devin Schwab
Tobias Springenberg
M. Martins
Thomas Lampe
Michael Neunert
A. Abdolmaleki
Tim Hertweck
Roland Hafner
F. Nori
Martin Riedmiller
76
22
0
13 Feb 2019
Value constrained model-free continuous control
Value constrained model-free continuous control
Steven Bohez
A. Abdolmaleki
Michael Neunert
J. Buchli
N. Heess
R. Hadsell
68
63
0
12 Feb 2019
A Theory of Regularized Markov Decision Processes
A Theory of Regularized Markov Decision Processes
Matthieu Geist
B. Scherrer
Olivier Pietquin
144
333
0
31 Jan 2019
Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate
  Shift
Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift
Carles Gelada
Marc G. Bellemare
OffRL
76
99
0
27 Jan 2019
Robust Temporal Difference Learning for Critical Domains
Robust Temporal Difference Learning for Critical Domains
R. Klíma
D. Bloembergen
Michael Kaisers
K. Tuyls
AAML
45
13
0
23 Jan 2019
Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study
  of the DQN Target
Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target
J. F. Hernandez-Garcia
R. Sutton
77
63
0
22 Jan 2019
ReNeg and Backseat Driver: Learning from Demonstration with Continuous
  Human Feedback
ReNeg and Backseat Driver: Learning from Demonstration with Continuous Human Feedback
Jacob Beck
Zoe Papakipos
Michael Littman
24
1
0
16 Jan 2019
Imitation-Regularized Offline Learning
Imitation-Regularized Offline Learning
Yifei Ma
Yu Wang
Balakrishnan
Balakrishnan Narayanaswamy
OffRL
61
22
0
15 Jan 2019
Self-supervised Learning of Image Embedding for Continuous Control
Self-supervised Learning of Image Embedding for Continuous Control
Carlos Florensa
Jonas Degrave
N. Heess
Jost Tobias Springenberg
Martin Riedmiller
SSL
58
53
0
03 Jan 2019
TD-Regularized Actor-Critic Methods
TD-Regularized Actor-Critic Methods
Simone Parisi
Voot Tangkaratt
Jan Peters
Mohammad Emtiyaz Khan
OffRL
61
31
0
19 Dec 2018
Dopamine: A Research Framework for Deep Reinforcement Learning
Dopamine: A Research Framework for Deep Reinforcement Learning
Pablo Samuel Castro
Subhodeep Moitra
Carles Gelada
Saurabh Kumar
Marc G. Bellemare
OffRL
84
279
0
14 Dec 2018
Off-Policy Deep Reinforcement Learning without Exploration
Off-Policy Deep Reinforcement Learning without Exploration
Scott Fujimoto
David Meger
Doina Precup
OffRLBDL
299
1,626
0
07 Dec 2018
Top-K Off-Policy Correction for a REINFORCE Recommender System
Top-K Off-Policy Correction for a REINFORCE Recommender System
Minmin Chen
Alex Beutel
Paul Covington
Sagar Jain
Francois Belletti
Ed H. Chi
CMLOffRL
149
485
0
06 Dec 2018
Relative Entropy Regularized Policy Iteration
Relative Entropy Regularized Policy Iteration
A. Abdolmaleki
Jost Tobias Springenberg
Jonas Degrave
Steven Bohez
Yuval Tassa
Dan Belov
N. Heess
Martin Riedmiller
68
72
0
05 Dec 2018
An Introduction to Deep Reinforcement Learning
An Introduction to Deep Reinforcement Learning
Vincent François-Lavet
Peter Henderson
Riashat Islam
Marc G. Bellemare
Joelle Pineau
OffRLAI4CE
173
1,279
0
30 Nov 2018
Previous
12345678
Next