Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.06111
Cited By
Sublinear Optimal Policy Value Estimation in Contextual Bandits
12 December 2019
Weihao Kong
Gregory Valiant
Emma Brunskill
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Sublinear Optimal Policy Value Estimation in Contextual Bandits"
11 / 11 papers shown
Title
Value Driven Representation for Human-in-the-Loop Reinforcement Learning
Ramtin Keramati
Emma Brunskill
OffRL
18
3
0
02 Apr 2020
Off-Policy Policy Gradient with State Distribution Correction
Yao Liu
Adith Swaminathan
Alekh Agarwal
Emma Brunskill
OffRL
157
67
0
17 Apr 2019
Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift
Carles Gelada
Marc G. Bellemare
OffRL
57
99
0
27 Jan 2019
Estimating Learnability in the Sublinear Data Regime
Weihao Kong
Gregory Valiant
69
30
0
04 May 2018
Fully adaptive algorithm for pure exploration in linear bandits
Liyuan Xu
Junya Honda
Masashi Sugiyama
53
85
0
16 Oct 2017
Policy Learning with Observational Data
Susan Athey
Stefan Wager
CML
OffRL
447
183
0
09 Feb 2017
Latent Contextual Bandits and their Application to Personalized Recommendations for New Users
Li Zhou
Emma Brunskill
41
62
0
22 Apr 2016
Best-Arm Identification in Linear Bandits
Marta Soare
A. Lazaric
Rémi Munos
65
178
0
22 Sep 2014
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
Alekh Agarwal
Daniel J. Hsu
Satyen Kale
John Langford
Lihong Li
Robert Schapire
OffRL
391
508
0
04 Feb 2014
lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits
Kevin Jamieson
Matthew Malloy
Robert D. Nowak
Sébastien Bubeck
84
415
0
27 Dec 2013
A Contextual-Bandit Approach to Personalized News Article Recommendation
Lihong Li
Wei Chu
John Langford
Robert Schapire
459
2,949
0
28 Feb 2010
1