18
54

Towards Instance Optimal Bounds for Best Arm Identification

Abstract

In the classical best arm identification (Best-11-Arm) problem, we are given nn stochastic bandit arms, each associated with a reward distribution with an unknown mean. We would like to identify the arm with the largest mean with probability at least 1δ1-\delta, using as few samples as possible. Understanding the sample complexity of Best-11-Arm has attracted significant attention since the last decade. However, the exact sample complexity of the problem is still unknown. Recently, Chen and Li made the gap-entropy conjecture concerning the instance sample complexity of Best-11-Arm. Given an instance II, let μ[i]\mu_{[i]} be the iith largest mean and Δ[i]=μ[1]μ[i]\Delta_{[i]}=\mu_{[1]}-\mu_{[i]} be the corresponding gap. H(I)=i=2nΔ[i]2H(I)=\sum_{i=2}^n\Delta_{[i]}^{-2} is the complexity of the instance. The gap-entropy conjecture states that Ω(H(I)(lnδ1+Ent(I)))\Omega\left(H(I)\cdot\left(\ln\delta^{-1}+\mathsf{Ent}(I)\right)\right) is an instance lower bound, where Ent(I)\mathsf{Ent}(I) is an entropy-like term determined by the gaps, and there is a δ\delta-correct algorithm for Best-11-Arm with sample complexity O(H(I)(lnδ1+Ent(I))+Δ[2]2lnlnΔ[2]1)O\left(H(I)\cdot\left(\ln\delta^{-1}+\mathsf{Ent}(I)\right)+\Delta_{[2]}^{-2}\ln\ln\Delta_{[2]}^{-1}\right). If the conjecture is true, we would have a complete understanding of the instance-wise sample complexity of Best-11-Arm. We make significant progress towards the resolution of the gap-entropy conjecture. For the upper bound, we provide a highly nontrivial algorithm which requires \[O\left(H(I)\cdot\left(\ln\delta^{-1} +\mathsf{Ent}(I)\right)+\Delta_{[2]}^{-2}\ln\ln\Delta_{[2]}^{-1}\mathrm{polylog}(n,\delta^{-1})\right)\] samples in expectation. For the lower bound, we show that for any Gaussian Best-11-Arm instance with gaps of the form 2k2^{-k}, any δ\delta-correct monotone algorithm requires Ω(H(I)(lnδ1+Ent(I)))\Omega\left(H(I)\cdot\left(\ln\delta^{-1} + \mathsf{Ent}(I)\right)\right) samples in expectation.

View on arXiv
Comments on this paper