Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

We consider combinatorial semi-bandits over a set of arms where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound , but it has computational complexity which is typically exponential in , and cannot be used in large dimensions. We propose the first algorithm which is both computationally and statistically efficient for this problem with regret and computational complexity . Our approach involves carefully designing an approximate version of ESCB with the same regret guarantees, showing that this approximate algorithm can be implemented in time by repeatedly maximizing a linear function over subject to a linear budget constraint, and showing how to solve this maximization problems efficiently.
View on arXiv