31
20
v1v2v3 (latest)

Efficient Online Bandit Multiclass Learning with O~(T)\tilde{O}(\sqrt{T}) Regret

Abstract

We present an efficient second-order algorithm with O~(1ηT)\tilde{O}(\frac{1}{\eta}\sqrt{T}) regret for the bandit online multiclass problem. The regret bound holds simultaneously with respect to a family of loss functions parameterized by η\eta, for a range of η\eta restricted by the norm of the competitor. The family of loss functions ranges from hinge loss (η=0\eta=0) to squared hinge loss (η=1\eta=1). This provides a solution to the open problem of (J. Abernethy and A. Rakhlin. An efficient bandit algorithm for T\sqrt{T}-regret in online multiclass prediction? In COLT, 2009). We test our algorithm experimentally, showing that it also performs favorably against earlier algorithms.

View on arXiv
Comments on this paper