16
4

Approximate Top-mm Arm Identification with Heterogeneous Reward Variances

Abstract

We study the effect of reward variance heterogeneity in the approximate top-mm arm identification setting. In this setting, the reward for the ii-th arm follows a σi2\sigma^2_i-sub-Gaussian distribution, and the agent needs to incorporate this knowledge to minimize the expected number of arm pulls to identify mm arms with the largest means within error ϵ\epsilon out of the nn arms, with probability at least 1δ1-\delta. We show that the worst-case sample complexity of this problem is \Theta\left( \sum_{i =1}^n \frac{\sigma_i^2}{\epsilon^2} \ln\frac{1}{\delta} + \sum_{i \in G^{m}} \frac{\sigma_i^2}{\epsilon^2} \ln(m) + \sum_{j \in G^{l}} \frac{\sigma_j^2}{\epsilon^2} \text{Ent}(\sigma^2_{G^{r}}) \right), where Gm,Gl,GrG^{m}, G^{l}, G^{r} are certain specific subsets of the overall arm set {1,2,,n}\{1, 2, \ldots, n\}, and Ent()\text{Ent}(\cdot) is an entropy-like function which measures the heterogeneity of the variance proxies. The upper bound of the complexity is obtained using a divide-and-conquer style algorithm, while the matching lower bound relies on the study of a dual formulation.

View on arXiv
Comments on this paper