Warm-starting Contextual Bandits: Robustly Combining Supervised and
Bandit Feedback

v1v2 (latest)

Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback

2 January 2019

ArXiv (abs)PDF HTML

Papers citing "Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback"

17 / 17 papers shown

Title
Warm Starting of CMA-ES for Contextual Optimization Problems Yuta Sekino Kento Uchida Shinichi Shirakawa 111 0 0 18 Feb 2025
Online Bandit Learning with Offline Preference Data for Improved RLHF Akhil Agnihotri Rahul Jain Deepak Ramachandran Zheng Wen OffRL 191 2 0 13 Jun 2024
Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits Siddhartha Banerjee Sean R. Sinclair Milind Tambe Lily Xu Chao Yu AI4TS 152 7 0 30 Sep 2022
Active Learning with Logged Data Songbai Yan Kamalika Chaudhuri T. Javidi 117 27 0 25 Feb 2018
A Contextual Bandit Bake-off A. Bietti Alekh Agarwal John Langford 364 105 0 12 Feb 2018
Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback Khanh Nguyen Hal Daumé Jordan L. Boyd-Graber 65 138 0 24 Jul 2017
Corralling a Band of Bandit Algorithms Alekh Agarwal Haipeng Luo Behnam Neyshabur Robert Schapire 146 157 0 19 Dec 2016
Conservative Contextual Linear Bandits Abbas Kazerouni Mohammad Ghavamzadeh Y. Abbasi Benjamin Van Roy 132 98 0 19 Nov 2016
Bandit Structured Prediction for Learning from Partial Feedback in Statistical Machine Translation Artem Sokolov Stefan Riezler Tanguy Urvoy 50 22 0 18 Jan 2016
Active Learning from Weak and Strong Labelers Chicheng Zhang Kamalika Chaudhuri 53 103 0 09 Oct 2015
Normalized Online Learning Stéphane Ross Paul Mineiro John Langford 146 69 0 09 Aug 2014
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits Alekh Agarwal Daniel J. Hsu Satyen Kale John Langford Lihong Li Robert Schapire OffRL 396 510 0 04 Feb 2014
Thompson Sampling for Contextual Bandits with Linear Payoffs Shipra Agrawal Navin Goyal 195 1,004 0 15 Sep 2012
Efficient Optimal Learning for Contextual Bandits Miroslav Dudík Daniel J. Hsu Satyen Kale Nikos Karampatziakis John Langford L. Reyzin Tong Zhang 192 302 0 13 Jun 2011
Online Importance Weight Aware Updates Nikos Karampatziakis John Langford 174 79 0 06 Nov 2010
Contextual Bandit Algorithms with Supervised Learning Guarantees A. Beygelzimer John Langford Lihong Li L. Reyzin Robert Schapire OffRL 199 326 0 22 Feb 2010
Domain Adaptation: Learning Bounds and Algorithms Yishay Mansour M. Mohri Afshin Rostamizadeh 298 801 0 19 Feb 2009