Sublinear Optimal Policy Value Estimation in Contextual Bandits

12 December 2019

Papers citing "Sublinear Optimal Policy Value Estimation in Contextual Bandits"

11 / 11 papers shown

Title
Value Driven Representation for Human-in-the-Loop Reinforcement Learning Ramtin Keramati Emma Brunskill OffRL 18 3 0 02 Apr 2020
Off-Policy Policy Gradient with State Distribution Correction Yao Liu Adith Swaminathan Alekh Agarwal Emma Brunskill OffRL 157 67 0 17 Apr 2019
Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift Carles Gelada Marc G. Bellemare OffRL 57 99 0 27 Jan 2019
Estimating Learnability in the Sublinear Data Regime Weihao Kong Gregory Valiant 69 30 0 04 May 2018
Fully adaptive algorithm for pure exploration in linear bandits Liyuan Xu Junya Honda Masashi Sugiyama 53 85 0 16 Oct 2017
Policy Learning with Observational Data Susan Athey Stefan Wager CML OffRL 447 183 0 09 Feb 2017
Latent Contextual Bandits and their Application to Personalized Recommendations for New Users Li Zhou Emma Brunskill 41 62 0 22 Apr 2016
Best-Arm Identification in Linear Bandits Marta Soare A. Lazaric Rémi Munos 65 178 0 22 Sep 2014
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits Alekh Agarwal Daniel J. Hsu Satyen Kale John Langford Lihong Li Robert Schapire OffRL 391 508 0 04 Feb 2014
lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits Kevin Jamieson Matthew Malloy Robert D. Nowak Sébastien Bubeck 84 415 0 27 Dec 2013
A Contextual-Bandit Approach to Personalized News Article Recommendation Lihong Li Wei Chu John Langford Robert Schapire 459 2,949 0 28 Feb 2010