Tight Gap-Dependent Memory-Regret Trade-Off for Single-Pass Streaming Stochastic Multi-Armed Bandits
We study the problem of minimizing gap-dependent regret for single-pass streaming stochastic multi-armed bandits (MAB). In this problem, the arms are present in a stream, and at most arms and their statistics can be stored in the memory. We establish tight non-asymptotic regret bounds regarding all relevant parameters, including the number of arms , the memory size , the number of rounds and where is the reward mean gap between the best arm and the -th arm. These gaps are not known in advance by the player. Specifically, for any constant , we present two algorithms: one applicable for with regret at most and another applicable for with regret at most . We also prove matching lower bounds for both cases by showing that for any constant and any , there exists a set of hard instances on which the regret of any algorithm is . This is the first tight gap-dependent regret bound for streaming MAB. Prior to our work, an upper bound for the special case of and was established by Agarwal, Khanna and Patil (COLT'22). In contrast, our results provide the correct order of regret as .
View on arXiv@article{ye2025_2503.02428, title={ Tight Gap-Dependent Memory-Regret Trade-Off for Single-Pass Streaming Stochastic Multi-Armed Bandits }, author={ Zichun Ye and Chihao Zhang and Jiahao Zhao }, journal={arXiv preprint arXiv:2503.02428}, year={ 2025 } }