Efficient Online Bandit Multiclass Learning with Regret

Abstract
We present an efficient second-order algorithm with regret for the bandit online multiclass problem. The regret bound holds simultaneously with respect to a family of loss functions parameterized by , for a range of restricted by the norm of the competitor. The family of loss functions ranges from hinge loss () to squared hinge loss (). This provides a solution to the open problem of (J. Abernethy and A. Rakhlin. An efficient bandit algorithm for -regret in online multiclass prediction? In COLT, 2009). We test our algorithm experimentally, showing that it also performs favorably against earlier algorithms.
View on arXivComments on this paper