Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits

16 October 2012

Papers citing "Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits"

8 / 8 papers shown

Title
Second Order Bounds for Contextual Bandits with Function Approximation Aldo Pacchiano 199 4 0 24 Sep 2024
Bayesian Off-Policy Evaluation and Learning for Large Action Spaces Imad Aouali Victor-Emmanuel Brunel David Rohde Anna Korba OffRL 131 5 0 22 Feb 2024
Doubly Robust Policy Evaluation and Learning Miroslav Dudík John Langford Lihong Li OffRL 299 697 0 23 Mar 2011
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms Lihong Li Wei Chu John Langford Xuanhui Wang OffRL 201 575 0 31 Mar 2010
A Contextual-Bandit Approach to Personalized News Article Recommendation Lihong Li Wei Chu John Langford Robert Schapire 417 2,944 0 28 Feb 2010
Learning from Logged Implicit Exploration Data Alexander L. Strehl John Langford Sham Kakade Lihong Li OffRL 171 255 0 27 Feb 2010
Contextual Bandit Algorithms with Supervised Learning Guarantees A. Beygelzimer John Langford Lihong Li L. Reyzin Robert Schapire OffRL 187 324 0 22 Feb 2010
The Offset Tree for Learning with Partial Labels A. Beygelzimer John Langford 262 184 0 21 Dec 2008