v1v2 (latest)

Safe and Efficient Off-Policy Reinforcement Learning

8 June 2016

Papers citing "Safe and Efficient Off-Policy Reinforcement Learning"

24 / 374 papers shown

Title
Learning with Options that Terminate Off-Policy Anna Harutyunyan Peter Vrancx Pierre-Luc Bacon Doina Precup A. Nowé OffRL 127 28 0 10 Nov 2017
A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning Marc Lanctot V. Zambaldi A. Gruslys Angeliki Lazaridou K. Tuyls Julien Perolat David Silver T. Graepel 147 639 0 02 Nov 2017
On- and Off-Policy Monotonic Policy Improvement R. Iwaki Minoru Asada OffRL 29 0 0 10 Oct 2017
Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents Marlos C. Machado Marc G. Bellemare Erik Talvitie J. Veness Matthew J. Hausknecht Michael Bowling 114 558 0 18 Sep 2017
A Brief Survey of Deep Reinforcement Learning Kai Arulkumaran M. Deisenroth Miles Brundage Anil Anthony Bharath OffRL 143 2,830 0 19 Aug 2017
Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning S. Gu Timothy Lillicrap Zoubin Ghahramani Richard Turner Bernhard Schölkopf Sergey Levine OffRL 96 164 0 01 Jun 2017
Convergent Tree Backup and Retrace with Function Approximation Ahmed Touati Pierre-Luc Bacon Doina Precup Pascal Vincent 106 40 0 25 May 2017
Guide Actor-Critic for Continuous Control Voot Tangkaratt A. Abdolmaleki Masashi Sugiyama 67 17 0 22 May 2017
Discrete Sequential Prediction of Continuous Actions for Deep RL Luke Metz Julian Ibarz Navdeep Jaitly James Davidson BDL OffRL 92 120 0 14 May 2017
Investigating Recurrence and Eligibility Traces in Deep Q-Networks J. Harb Doina Precup 54 21 0 18 Apr 2017
The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning A. Gruslys Will Dabney M. G. Azar Bilal Piot Marc G. Bellemare Rémi Munos 76 58 0 15 Apr 2017
On Generalized Bellman Equations and Temporal-Difference Learning Huizhen Yu A. R. Mahmood R. Sutton 118 29 0 14 Apr 2017
Deep Exploration via Randomized Value Functions Ian Osband Benjamin Van Roy Daniel Russo Zheng Wen 116 307 0 22 Mar 2017
Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation Z. Guo Philip S. Thomas Emma Brunskill OffRL 128 2 0 09 Mar 2017
Neural Episodic Control Alexander Pritzel Benigno Uria Sriram Srinivasan A. Badia Oriol Vinyals Demis Hassabis Daan Wierstra Charles Blundell OffRL BDL 113 346 0 06 Mar 2017
Count-Based Exploration with Neural Density Models Georg Ostrovski Marc G. Bellemare Aaron van den Oord Rémi Munos 104 626 0 03 Mar 2017
Bridging the Gap Between Value and Policy Based Reinforcement Learning Ofir Nachum Mohammad Norouzi Kelvin Xu Dale Schuurmans 203 478 0 28 Feb 2017
Reinforcement Learning Algorithm Selection Romain Laroche Raphael Feraud OffRL 74 8 0 30 Jan 2017
Deep Reinforcement Learning: An Overview Yuxi Li OffRL VLM 346 1,549 0 25 Jan 2017
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic S. Gu Timothy Lillicrap Zoubin Ghahramani Richard Turner Sergey Levine OffRL BDL 106 345 0 07 Nov 2016
Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening Frank S. He Yang Liu Alex Schwing Jian-wei Peng 91 84 0 05 Nov 2016
Sample Efficient Actor-Critic with Experience Replay Ziyun Wang V. Bapst N. Heess Volodymyr Mnih Rémi Munos Koray Kavukcuoglu Nando de Freitas 129 762 0 03 Nov 2016
Unifying Count-Based Exploration and Intrinsic Motivation Marc G. Bellemare S. Srinivasan Georg Ostrovski Tom Schaul D. Saxton Rémi Munos 195 1,485 0 06 Jun 2016
Q( $λ$ ) with Off-Policy Corrections Anna Harutyunyan Marc G. Bellemare T. Stepleton Rémi Munos OffRL 99 96 0 16 Feb 2016