We study the stochastic multi-armed bandit problem in the -pass streaming model. In this problem, the arms are present in a stream and at most arms and their statistics can be stored in the memory. We give a complete characterization of the optimal regret in terms of and . Specifically, we design an algorithm with regret and complement it with an lower bound when the number of rounds is sufficiently large. Our results are tight up to a logarithmic factor in and .
View on arXiv