Model Selection in Contextual Stochastic Bandit Problems
We study model selection in stochastic bandit problems. Our approach relies on a master algorithm that selects its actions among candidate base algorithms. While this problem is studied for specific classes of stochastic base algorithms, our objective is to provide a method that can work with more general classes of stochastic base algorithms. We propose a master algorithm inspired by CORRAL \cite{DBLP:conf/colt/AgarwalLNS17} and introduce a novel and generic smoothing transformation for stochastic bandit algorithms that permits us to obtain regret guarantees for a wide class of base algorithms when working along with our master. We exhibit a lower bound showing that even when one of the base algorithms has regret, in general it is impossible to get better than regret in model selection, even asymptotically. We apply our algorithm to choose among different values of for the -greedy algorithm, and to choose between the -armed UCB and linear UCB algorithms. Our empirical studies further confirm the effectiveness of our model-selection method.
View on arXiv