A Practical Guide of Off-Policy Evaluation for Bandit Problems

A Practical Guide of Off-Policy Evaluation for Bandit Problems

23 October 2020

Papers citing "A Practical Guide of Off-Policy Evaluation for Bandit Problems"

16 / 16 papers shown

Title
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation Yuta Saito Shunsuke Aihara Megumi Matsutani Yusuke Narita OffRL 108 75 0 17 Aug 2020
Confidence Interval for Off-Policy Evaluation from Dependent Samples via Bandit Algorithm: Approach from Standardized Martingales Masahiro Kato OffRL 22 2 0 12 Jun 2020
Off-Policy Evaluation and Learning for External Validity under a Covariate Shift Masahiro Kato Masatoshi Uehara Shota Yasui OffRL 41 53 0 26 Feb 2020
More Efficient Off-Policy Evaluation through Regularized Targeted Learning Aurélien F. Bibaut Ivana Malenica N. Vlassis Mark van der Laan OOD OffRL 27 40 0 13 Dec 2019
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning Nathan Kallus Masatoshi Uehara OffRL 55 54 0 09 Jun 2019
Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models Michael Oberst David Sontag CML OffRL 43 169 0 14 May 2019
Efficient Counterfactual Learning from Bandit Feedback Yusuke Narita Shota Yasui Kohei Yata OffRL 52 47 0 10 Sep 2018
More Robust Doubly Robust Off-policy Evaluation Mehrdad Farajtabar Yinlam Chow Mohammad Ghavamzadeh OffRL 51 267 0 10 Feb 2018
Offline A/B testing for Recommender Systems Alexandre Gilotte Clément Calauzènes Thomas Nedelec A. Abraham Simon Dollé OffRL 59 220 0 22 Jan 2018
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits Yu Wang Alekh Agarwal Miroslav Dudík OffRL 56 220 0 04 Dec 2016
Batched bandit problems Vianney Perchet Philippe Rigollet Sylvain Chassang E. Snowberg OffRL 106 200 0 02 May 2015
Collaborative Filtering Bandits Shuai Li Alexandros Karatzoglou Claudio Gentile 58 315 0 11 Feb 2015
Doubly Robust Policy Evaluation and Learning Miroslav Dudík John Langford Lihong Li OffRL 151 694 0 23 Mar 2011
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms Lihong Li Wei Chu John Langford Xuanhui Wang OffRL 150 574 0 31 Mar 2010
A Contextual-Bandit Approach to Personalized News Article Recommendation Lihong Li Wei Chu John Langford Robert Schapire 267 2,935 0 28 Feb 2010
The Offset Tree for Learning with Partial Labels A. Beygelzimer John Langford 99 184 0 21 Dec 2008