Towards Instance Optimal Bounds for Best Arm Identification

In the classical best arm identification (Best--Arm) problem, we are given stochastic bandit arms, each associated with a reward distribution with an unknown mean. We would like to identify the arm with the largest mean with probability at least , using as few samples as possible. Understanding the sample complexity of Best--Arm has attracted significant attention since the last decade. However, the exact sample complexity of the problem is still unknown. Recently, Chen and Li made the gap-entropy conjecture concerning the instance sample complexity of Best--Arm. Given an instance , let be the th largest mean and be the corresponding gap. is the complexity of the instance. The gap-entropy conjecture states that is an instance lower bound, where is an entropy-like term determined by the gaps, and there is a -correct algorithm for Best--Arm with sample complexity . If the conjecture is true, we would have a complete understanding of the instance-wise sample complexity of Best--Arm. We make significant progress towards the resolution of the gap-entropy conjecture. For the upper bound, we provide a highly nontrivial algorithm which requires \[O\left(H(I)\cdot\left(\ln\delta^{-1} +\mathsf{Ent}(I)\right)+\Delta_{[2]}^{-2}\ln\ln\Delta_{[2]}^{-1}\mathrm{polylog}(n,\delta^{-1})\right)\] samples in expectation. For the lower bound, we show that for any Gaussian Best--Arm instance with gaps of the form , any -correct monotone algorithm requires samples in expectation.
View on arXiv