133
22

Tight Bounds for Bandit Combinatorial Optimization

Abstract

We revisit the study of optimal regret rates in bandit combinatorial optimization---a fundamental framework for sequential decision making under uncertainty that abstracts numerous combinatorial prediction problems. We prove that the attainable regret in this setting grows as Θ~(k3/2dT)\widetilde{\Theta}(k^{3/2}\sqrt{dT}) where dd is the dimension of the problem and kk is a bound over the maximal instantaneous loss, disproving a conjecture of Audibert, Bubeck, and Lugosi (2013) who argued that the optimal rate should be of the form Θ~(kdT)\widetilde{\Theta}(k\sqrt{dT}). Our bounds apply to several important instances of the framework, and in particular, imply a tight bound for the well-studied bandit shortest path problem. By that, we also resolve an open problem posed by Cesa-Bianchi and Lugosi (2012).

View on arXiv
Comments on this paper