Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.12470
Cited By
A Practical Guide of Off-Policy Evaluation for Bandit Problems
23 October 2020
Masahiro Kato
Kenshi Abe
Kaito Ariu
Shota Yasui
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Practical Guide of Off-Policy Evaluation for Bandit Problems"
16 / 16 papers shown
Title
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
Yuta Saito
Shunsuke Aihara
Megumi Matsutani
Yusuke Narita
OffRL
108
75
0
17 Aug 2020
Confidence Interval for Off-Policy Evaluation from Dependent Samples via Bandit Algorithm: Approach from Standardized Martingales
Masahiro Kato
OffRL
22
2
0
12 Jun 2020
Off-Policy Evaluation and Learning for External Validity under a Covariate Shift
Masahiro Kato
Masatoshi Uehara
Shota Yasui
OffRL
41
53
0
26 Feb 2020
More Efficient Off-Policy Evaluation through Regularized Targeted Learning
Aurélien F. Bibaut
Ivana Malenica
N. Vlassis
Mark van der Laan
OOD
OffRL
27
40
0
13 Dec 2019
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
Nathan Kallus
Masatoshi Uehara
OffRL
55
54
0
09 Jun 2019
Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models
Michael Oberst
David Sontag
CML
OffRL
43
169
0
14 May 2019
Efficient Counterfactual Learning from Bandit Feedback
Yusuke Narita
Shota Yasui
Kohei Yata
OffRL
52
47
0
10 Sep 2018
More Robust Doubly Robust Off-policy Evaluation
Mehrdad Farajtabar
Yinlam Chow
Mohammad Ghavamzadeh
OffRL
51
267
0
10 Feb 2018
Offline A/B testing for Recommender Systems
Alexandre Gilotte
Clément Calauzènes
Thomas Nedelec
A. Abraham
Simon Dollé
OffRL
59
220
0
22 Jan 2018
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
Yu Wang
Alekh Agarwal
Miroslav Dudík
OffRL
56
220
0
04 Dec 2016
Batched bandit problems
Vianney Perchet
Philippe Rigollet
Sylvain Chassang
E. Snowberg
OffRL
106
200
0
02 May 2015
Collaborative Filtering Bandits
Shuai Li
Alexandros Karatzoglou
Claudio Gentile
58
315
0
11 Feb 2015
Doubly Robust Policy Evaluation and Learning
Miroslav Dudík
John Langford
Lihong Li
OffRL
151
694
0
23 Mar 2011
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
Lihong Li
Wei Chu
John Langford
Xuanhui Wang
OffRL
150
574
0
31 Mar 2010
A Contextual-Bandit Approach to Personalized News Article Recommendation
Lihong Li
Wei Chu
John Langford
Robert Schapire
267
2,935
0
28 Feb 2010
The Offset Tree for Learning with Partial Labels
A. Beygelzimer
John Langford
99
184
0
21 Dec 2008
1