63
91

Bounded regret in stochastic multi-armed bandits

Abstract

We study the stochastic multi-armed bandit problem when one knows the value μ()\mu^{(\star)} of an optimal arm, as a well as a positive lower bound on the smallest positive gap Δ\Delta. We propose a new randomized policy that attains a regret {\em uniformly bounded over time} in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows Δ\Delta, and bounded regret of order 1/Δ1/\Delta is not possible if one only knows μ()\mu^{(\star)}

View on arXiv
Comments on this paper