18
7

Linear Bandits on Uniformly Convex Sets

Abstract

Linear bandit algorithms yield O~(nT)\tilde{\mathcal{O}}(n\sqrt{T}) pseudo-regret bounds on compact convex action sets KRn\mathcal{K}\subset\mathbb{R}^n and two types of structural assumptions lead to better pseudo-regret bounds. When K\mathcal{K} is the simplex or an p\ell_p ball with p]1,2]p\in]1,2], there exist bandits algorithms with O~(nT)\tilde{\mathcal{O}}(\sqrt{nT}) pseudo-regret bounds. Here, we derive bandit algorithms for some strongly convex sets beyond p\ell_p balls that enjoy pseudo-regret bounds of O~(nT)\tilde{\mathcal{O}}(\sqrt{nT}), which answers an open question from [BCB12, \S 5.5.]. Interestingly, when the action set is uniformly convex but not necessarily strongly convex, we obtain pseudo-regret bounds with a dimension dependency smaller than O(n)\mathcal{O}(\sqrt{n}). However, this comes at the expense of asymptotic rates in TT varying between O~(T)\tilde{\mathcal{O}}(\sqrt{T}) and O~(T)\tilde{\mathcal{O}}(T).

View on arXiv
Comments on this paper