Combining policy gradient and Q-learning

Combining policy gradient and Q-learning

5 November 2016

Brendan O'Donoghue

Koray Kavukcuoglu

Papers citing "Combining policy gradient and Q-learning"

15 / 15 papers shown

Title
Bridging the Gap Between Value and Policy Based Reinforcement Learning Ofir Nachum Mohammad Norouzi Kelvin Xu Dale Schuurmans 156 471 0 28 Feb 2017
Reward Augmented Maximum Likelihood for Neural Structured Prediction Mohammad Norouzi Samy Bengio Zhiwen Chen Navdeep Jaitly M. Schuster Yonghui Wu Dale Schuurmans 77 253 0 01 Sep 2016
Asynchronous Methods for Deep Reinforcement Learning Volodymyr Mnih Adria Puigdomenech Badia M. Berk Mirza Alex Graves Timothy Lillicrap Tim Harley David Silver Koray Kavukcuoglu 191 8,850 0 04 Feb 2016
Taming the Noise in Reinforcement Learning via Soft Updates Roy Fox Ari Pakman Naftali Tishby 70 338 0 28 Dec 2015
Policy Gradient Methods for Off-policy Control Lucas Lehnert Doina Precup OffRL 31 4 0 13 Dec 2015
Dueling Network Architectures for Deep Reinforcement Learning Ziyun Wang Tom Schaul Matteo Hessel H. V. Hasselt Marc Lanctot Nando de Freitas OffRL 91 3,755 0 20 Nov 2015
Prioritized Experience Replay Tom Schaul John Quan Ioannis Antonoglou David Silver OffRL 214 3,787 0 18 Nov 2015
Deep Reinforcement Learning with Double Q-learning H. V. Hasselt A. Guez David Silver OffRL 158 7,635 0 22 Sep 2015
Continuous control with deep reinforcement learning Timothy Lillicrap Jonathan J. Hunt Alexander Pritzel N. Heess Tom Erez Yuval Tassa David Silver Daan Wierstra 318 13,234 0 09 Sep 2015
End-to-End Training of Deep Visuomotor Policies Sergey Levine Chelsea Finn Trevor Darrell Pieter Abbeel BDL 305 3,434 0 02 Apr 2015
Trust Region Policy Optimization John Schulman Sergey Levine Philipp Moritz Michael I. Jordan Pieter Abbeel 277 6,764 0 19 Feb 2015
Playing Atari with Deep Reinforcement Learning Volodymyr Mnih Koray Kavukcuoglu David Silver Alex Graves Ioannis Antonoglou Daan Wierstra Martin Riedmiller 119 12,223 0 19 Dec 2013
Revisiting Natural Gradient for Deep Networks Razvan Pascanu Yoshua Bengio ODL 134 389 0 16 Jan 2013
The Arcade Learning Environment: An Evaluation Platform for General Agents Marc G. Bellemare Yavar Naddaf J. Veness Michael Bowling 109 3,004 0 19 Jul 2012
Dynamic Policy Programming M. G. Azar Vicencc Gómez H. Kappen 100 123 0 12 Apr 2010