Approximate Top- Arm Identification with Heterogeneous Reward Variances

We study the effect of reward variance heterogeneity in the approximate top- arm identification setting. In this setting, the reward for the -th arm follows a -sub-Gaussian distribution, and the agent needs to incorporate this knowledge to minimize the expected number of arm pulls to identify arms with the largest means within error out of the arms, with probability at least . We show that the worst-case sample complexity of this problem is \Theta\left( \sum_{i =1}^n \frac{\sigma_i^2}{\epsilon^2} \ln\frac{1}{\delta} + \sum_{i \in G^{m}} \frac{\sigma_i^2}{\epsilon^2} \ln(m) + \sum_{j \in G^{l}} \frac{\sigma_j^2}{\epsilon^2} \text{Ent}(\sigma^2_{G^{r}}) \right), where are certain specific subsets of the overall arm set , and is an entropy-like function which measures the heterogeneity of the variance proxies. The upper bound of the complexity is obtained using a divide-and-conquer style algorithm, while the matching lower bound relies on the study of a dual formulation.
View on arXiv