ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.02647
  4. Cited By
Safe and Efficient Off-Policy Reinforcement Learning
v1v2 (latest)

Safe and Efficient Off-Policy Reinforcement Learning

8 June 2016
Rémi Munos
T. Stepleton
Anna Harutyunyan
Marc G. Bellemare
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Safe and Efficient Off-Policy Reinforcement Learning"

24 / 374 papers shown
Title
Learning with Options that Terminate Off-Policy
Learning with Options that Terminate Off-Policy
Anna Harutyunyan
Peter Vrancx
Pierre-Luc Bacon
Doina Precup
A. Nowé
OffRL
127
28
0
10 Nov 2017
A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning
A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning
Marc Lanctot
V. Zambaldi
A. Gruslys
Angeliki Lazaridou
K. Tuyls
Julien Perolat
David Silver
T. Graepel
147
639
0
02 Nov 2017
On- and Off-Policy Monotonic Policy Improvement
On- and Off-Policy Monotonic Policy Improvement
R. Iwaki
Minoru Asada
OffRL
29
0
0
10 Oct 2017
Revisiting the Arcade Learning Environment: Evaluation Protocols and
  Open Problems for General Agents
Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
Marlos C. Machado
Marc G. Bellemare
Erik Talvitie
J. Veness
Matthew J. Hausknecht
Michael Bowling
114
558
0
18 Sep 2017
A Brief Survey of Deep Reinforcement Learning
A Brief Survey of Deep Reinforcement Learning
Kai Arulkumaran
M. Deisenroth
Miles Brundage
Anil Anthony Bharath
OffRL
143
2,830
0
19 Aug 2017
Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient
  Estimation for Deep Reinforcement Learning
Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning
S. Gu
Timothy Lillicrap
Zoubin Ghahramani
Richard Turner
Bernhard Schölkopf
Sergey Levine
OffRL
96
164
0
01 Jun 2017
Convergent Tree Backup and Retrace with Function Approximation
Convergent Tree Backup and Retrace with Function Approximation
Ahmed Touati
Pierre-Luc Bacon
Doina Precup
Pascal Vincent
106
40
0
25 May 2017
Guide Actor-Critic for Continuous Control
Guide Actor-Critic for Continuous Control
Voot Tangkaratt
A. Abdolmaleki
Masashi Sugiyama
67
17
0
22 May 2017
Discrete Sequential Prediction of Continuous Actions for Deep RL
Discrete Sequential Prediction of Continuous Actions for Deep RL
Luke Metz
Julian Ibarz
Navdeep Jaitly
James Davidson
BDLOffRL
92
120
0
14 May 2017
Investigating Recurrence and Eligibility Traces in Deep Q-Networks
Investigating Recurrence and Eligibility Traces in Deep Q-Networks
J. Harb
Doina Precup
54
21
0
18 Apr 2017
The Reactor: A fast and sample-efficient Actor-Critic agent for
  Reinforcement Learning
The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
A. Gruslys
Will Dabney
M. G. Azar
Bilal Piot
Marc G. Bellemare
Rémi Munos
76
58
0
15 Apr 2017
On Generalized Bellman Equations and Temporal-Difference Learning
On Generalized Bellman Equations and Temporal-Difference Learning
Huizhen Yu
A. R. Mahmood
R. Sutton
118
29
0
14 Apr 2017
Deep Exploration via Randomized Value Functions
Deep Exploration via Randomized Value Functions
Ian Osband
Benjamin Van Roy
Daniel Russo
Zheng Wen
116
307
0
22 Mar 2017
Using Options and Covariance Testing for Long Horizon Off-Policy Policy
  Evaluation
Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation
Z. Guo
Philip S. Thomas
Emma Brunskill
OffRL
128
2
0
09 Mar 2017
Neural Episodic Control
Neural Episodic Control
Alexander Pritzel
Benigno Uria
Sriram Srinivasan
A. Badia
Oriol Vinyals
Demis Hassabis
Daan Wierstra
Charles Blundell
OffRLBDL
113
346
0
06 Mar 2017
Count-Based Exploration with Neural Density Models
Count-Based Exploration with Neural Density Models
Georg Ostrovski
Marc G. Bellemare
Aaron van den Oord
Rémi Munos
104
626
0
03 Mar 2017
Bridging the Gap Between Value and Policy Based Reinforcement Learning
Bridging the Gap Between Value and Policy Based Reinforcement Learning
Ofir Nachum
Mohammad Norouzi
Kelvin Xu
Dale Schuurmans
203
478
0
28 Feb 2017
Reinforcement Learning Algorithm Selection
Reinforcement Learning Algorithm Selection
Romain Laroche
Raphael Feraud
OffRL
74
8
0
30 Jan 2017
Deep Reinforcement Learning: An Overview
Deep Reinforcement Learning: An Overview
Yuxi Li
OffRLVLM
346
1,549
0
25 Jan 2017
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
S. Gu
Timothy Lillicrap
Zoubin Ghahramani
Richard Turner
Sergey Levine
OffRLBDL
106
345
0
07 Nov 2016
Learning to Play in a Day: Faster Deep Reinforcement Learning by
  Optimality Tightening
Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening
Frank S. He
Yang Liu
Alex Schwing
Jian-wei Peng
91
84
0
05 Nov 2016
Sample Efficient Actor-Critic with Experience Replay
Sample Efficient Actor-Critic with Experience Replay
Ziyun Wang
V. Bapst
N. Heess
Volodymyr Mnih
Rémi Munos
Koray Kavukcuoglu
Nando de Freitas
129
762
0
03 Nov 2016
Unifying Count-Based Exploration and Intrinsic Motivation
Unifying Count-Based Exploration and Intrinsic Motivation
Marc G. Bellemare
S. Srinivasan
Georg Ostrovski
Tom Schaul
D. Saxton
Rémi Munos
195
1,485
0
06 Jun 2016
Q($λ$) with Off-Policy Corrections
Q(λλλ) with Off-Policy Corrections
Anna Harutyunyan
Marc G. Bellemare
T. Stepleton
Rémi Munos
OffRL
99
96
0
16 Feb 2016
Previous
12345678