14
4

Active clustering with bandit feedback

Abstract

We investigate the Active Clustering Problem (ACP). A learner interacts with an NN-armed stochastic bandit with dd-dimensional subGaussian feedback. There exists a hidden partition of the arms into KK groups, such that arms within the same group, share the same mean vector. The learner's task is to uncover this hidden partition with the smallest budget - i.e., the least number of observation - and with a probability of error smaller than a prescribed constant δ\delta. In this paper, (i) we derive a non-asymptotic lower bound for the budget, and (ii) we introduce the computationally efficient ACB algorithm, whose budget matches the lower bound in most regimes. We improve on the performance of a uniform sampling strategy. Importantly, contrary to the batch setting, we establish that there is no computation-information gap in the active setting.

View on arXiv
Comments on this paper