Safe Policy Improvement with an Estimated Baseline Policy

v1v2 (latest)

Safe Policy Improvement with an Estimated Baseline Policy

11 September 2019

Rémi Tachet des Combes

ArXiv (abs)PDF HTML

Papers citing "Safe Policy Improvement with an Estimated Baseline Policy"

12 / 12 papers shown

Title
Safe Policy Improvement with Soft Baseline Bootstrapping Kimia Nadjahi Romain Laroche Rémi Tachet des Combes OffRL 60 36 0 11 Jul 2019
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction Aviral Kumar Justin Fu George Tucker Sergey Levine OffRL OnRL 137 1,066 0 03 Jun 2019
Decentralized Exploration in Multi-Armed Bandits -- Extended version Raphael Feraud Réda Alami Romain Laroche FedML 86 22 0 19 Nov 2018
Importance Sampling Policy Evaluation with an Estimated Behavior Policy Josiah P. Hanna S. Niekum Peter Stone OffRL 52 68 0 04 Jun 2018
DORA The Explorer: Directed Outreaching Reinforcement Action-Selection Leshem Choshen Lior Fox Y. Loewenstein OffRL 99 64 0 11 Apr 2018
Safe Policy Improvement with Baseline Bootstrapping Romain Laroche P. Trichelair Rémi Tachet des Combes OffRL 69 201 0 19 Dec 2017
Hybrid Reward Architecture for Reinforcement Learning H. V. Seijen Mehdi Fatemi Joshua Romoff Romain Laroche Tavian Barnes Jeffrey Tsang 60 253 0 13 Jun 2017
Safe Policy Improvement by Minimizing Robust Baseline Regret Marek Petrik Yinlam Chow Mohammad Ghavamzadeh OffRL 95 134 0 13 Jul 2016
Unifying Count-Based Exploration and Intrinsic Motivation Marc G. Bellemare S. Srinivasan Georg Ostrovski Tom Schaul D. Saxton Rémi Munos 179 1,484 0 06 Jun 2016
Deep Reinforcement Learning with Double Q-learning H. V. Hasselt A. Guez David Silver OffRL 175 7,665 0 22 Sep 2015
Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models Iulian Serban Alessandro Sordoni Yoshua Bengio Aaron Courville Joelle Pineau AILaw 171 1,756 0 17 Jul 2015
Trust Region Policy Optimization John Schulman Sergey Levine Philipp Moritz Michael I. Jordan Pieter Abbeel 279 6,801 0 19 Feb 2015