We give a near-optimal sample-pass trade-off for pure exploration in multi-armed bandits (MABs) via multi-pass streaming algorithms: any streaming algorithm with sublinear memory that uses the optimal sample complexity of requires passes. Here, is the number of arms and is the reward gap between the best and the second-best arms. Our result matches the -pass algorithm of Jin et al. [ICML'21] (up to lower order terms) that only uses memory and answers an open question posed by Assadi and Wang [STOC'20].
View on arXiv