ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.05236
  4. Cited By
Safe Policy Improvement with an Estimated Baseline Policy
v1v2 (latest)

Safe Policy Improvement with an Estimated Baseline Policy

11 September 2019
T. D. Simão
Romain Laroche
Rémi Tachet des Combes
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Safe Policy Improvement with an Estimated Baseline Policy"

12 / 12 papers shown
Title
Safe Policy Improvement with Soft Baseline Bootstrapping
Safe Policy Improvement with Soft Baseline Bootstrapping
Kimia Nadjahi
Romain Laroche
Rémi Tachet des Combes
OffRL
60
36
0
11 Jul 2019
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
Aviral Kumar
Justin Fu
George Tucker
Sergey Levine
OffRLOnRL
137
1,066
0
03 Jun 2019
Decentralized Exploration in Multi-Armed Bandits -- Extended version
Decentralized Exploration in Multi-Armed Bandits -- Extended version
Raphael Feraud
Réda Alami
Romain Laroche
FedML
86
22
0
19 Nov 2018
Importance Sampling Policy Evaluation with an Estimated Behavior Policy
Importance Sampling Policy Evaluation with an Estimated Behavior Policy
Josiah P. Hanna
S. Niekum
Peter Stone
OffRL
52
68
0
04 Jun 2018
DORA The Explorer: Directed Outreaching Reinforcement Action-Selection
DORA The Explorer: Directed Outreaching Reinforcement Action-Selection
Leshem Choshen
Lior Fox
Y. Loewenstein
OffRL
99
64
0
11 Apr 2018
Safe Policy Improvement with Baseline Bootstrapping
Safe Policy Improvement with Baseline Bootstrapping
Romain Laroche
P. Trichelair
Rémi Tachet des Combes
OffRL
69
201
0
19 Dec 2017
Hybrid Reward Architecture for Reinforcement Learning
Hybrid Reward Architecture for Reinforcement Learning
H. V. Seijen
Mehdi Fatemi
Joshua Romoff
Romain Laroche
Tavian Barnes
Jeffrey Tsang
60
253
0
13 Jun 2017
Safe Policy Improvement by Minimizing Robust Baseline Regret
Safe Policy Improvement by Minimizing Robust Baseline Regret
Marek Petrik
Yinlam Chow
Mohammad Ghavamzadeh
OffRL
95
134
0
13 Jul 2016
Unifying Count-Based Exploration and Intrinsic Motivation
Unifying Count-Based Exploration and Intrinsic Motivation
Marc G. Bellemare
S. Srinivasan
Georg Ostrovski
Tom Schaul
D. Saxton
Rémi Munos
179
1,484
0
06 Jun 2016
Deep Reinforcement Learning with Double Q-learning
Deep Reinforcement Learning with Double Q-learning
H. V. Hasselt
A. Guez
David Silver
OffRL
175
7,665
0
22 Sep 2015
Building End-To-End Dialogue Systems Using Generative Hierarchical
  Neural Network Models
Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models
Iulian Serban
Alessandro Sordoni
Yoshua Bengio
Aaron Courville
Joelle Pineau
AILaw
171
1,756
0
17 Jul 2015
Trust Region Policy Optimization
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
279
6,801
0
19 Feb 2015
1