30
1

Ensemble sampling for linear bandits: small ensembles suffice

Abstract

We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a dd-dimensional stochastic linear bandit with an interaction horizon TT, ensemble sampling with an ensemble of size of order dlogTd \log T incurs regret at most of the order (dlogT)5/2T(d \log T)^{5/2} \sqrt{T}. Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with TT -- which defeats the purpose of ensemble sampling -- while obtaining near T\smash{\sqrt{T}} order regret. Our result is also the first to allow for infinite action sets.

View on arXiv
Comments on this paper