Empirical Likelihood for Contextual Bandits

Empirical Likelihood for Contextual Bandits

7 June 2019

Nikos Karampatziakis

Papers citing "Empirical Likelihood for Contextual Bandits"

14 / 14 papers shown

Title
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning Nathan Kallus Masatoshi Uehara OffRL 42 54 0 09 Jun 2019
Off-Policy Policy Gradient with State Distribution Correction Yao Liu Adith Swaminathan Alekh Agarwal Emma Brunskill OffRL 61 67 0 17 Apr 2019
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures L. Espeholt Hubert Soyer Rémi Munos Karen Simonyan Volodymyr Mnih ... Vlad Firoiu Tim Harley Iain Dunning Shane Legg Koray Kavukcuoglu 127 1,584 0 05 Feb 2018
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits Yu Wang Alekh Agarwal Miroslav Dudík OffRL 47 220 0 04 Dec 2016
Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach John C. Duchi Peter Glynn Hongseok Namkoong 82 321 0 11 Oct 2016
Off-policy evaluation for slate recommendation Adith Swaminathan A. Krishnamurthy Alekh Agarwal Miroslav Dudík John Langford Damien Jose I. Zitouni CML OffRL 28 225 0 16 May 2016
Trust Region Policy Optimization John Schulman Sergey Levine Philipp Moritz Michael I. Jordan Pieter Abbeel 221 6,722 0 19 Feb 2015
Multinomial and empirical likelihood under convex constraints: directions of recession, Fenchel duality, perturbations Marian Grendár Vladimír vSpitalský 37 3 0 24 Aug 2014
OpenML: networked science in machine learning Joaquin Vanschoren Jan N. van Rijn B. Bischl Luís Torgo FedML AI4CE 62 1,310 0 29 Jul 2014
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits Alekh Agarwal Daniel J. Hsu Satyen Kale John Langford Lihong Li Robert Schapire OffRL 111 504 0 04 Feb 2014
Kullback-Leibler upper confidence bounds for optimal sequential allocation Olivier Cappé Aurélien Garivier Odalric-Ambrym Maillard Rémi Munos Gilles Stoltz 58 394 0 03 Oct 2012
Counterfactual Reasoning and Learning Systems Léon Bottou J. Peters J. Q. Candela Denis Xavier Charles D. M. Chickering Elon Portugaly Dipankar Ray Patrice Y. Simard Edward Snelson CML OffRL 119 781 0 11 Sep 2012
Doubly Robust Policy Evaluation and Learning Miroslav Dudík John Langford Lihong Li OffRL 95 694 0 23 Mar 2011
A Contextual-Bandit Approach to Personalized News Article Recommendation Lihong Li Wei Chu John Langford Robert Schapire 200 2,935 0 28 Feb 2010