37
1

Tight Gap-Dependent Memory-Regret Trade-Off for Single-Pass Streaming Stochastic Multi-Armed Bandits

Abstract

We study the problem of minimizing gap-dependent regret for single-pass streaming stochastic multi-armed bandits (MAB). In this problem, the nn arms are present in a stream, and at most m<nm<n arms and their statistics can be stored in the memory. We establish tight non-asymptotic regret bounds regarding all relevant parameters, including the number of arms nn, the memory size mm, the number of rounds TT and (Δi)i[n](\Delta_i)_{i\in [n]} where Δi\Delta_i is the reward mean gap between the best arm and the ii-th arm. These gaps are not known in advance by the player. Specifically, for any constant α1\alpha \ge 1, we present two algorithms: one applicable for m23nm\ge \frac{2}{3}n with regret at most Oα((nm)T1α+1n1+1α+1i:Δi>0Δi12α)O_\alpha\Big(\frac{(n-m)T^{\frac{1}{\alpha + 1}}}{n^{1 + {\frac{1}{\alpha + 1}}}}\displaystyle\sum_{i:\Delta_i > 0}\Delta_i^{1 - 2\alpha}\Big) and another applicable for m<23nm<\frac{2}{3}n with regret at most Oα(T1α+1m1α+1i:Δi>0Δi12α)O_\alpha\Big(\frac{T^{\frac{1}{\alpha+1}}}{m^{\frac{1}{\alpha+1}}}\displaystyle\sum_{i:\Delta_i > 0}\Delta_i^{1 - 2\alpha}\Big). We also prove matching lower bounds for both cases by showing that for any constant α1\alpha\ge 1 and any mk<nm\leq k < n, there exists a set of hard instances on which the regret of any algorithm is Ωα((km+1)T1α+1k1+1α+1i:Δi>0Δi12α)\Omega_\alpha\Big(\frac{(k-m+1) T^{\frac{1}{\alpha+1}}}{k^{1 + \frac{1}{\alpha+1}}} \sum_{i:\Delta_i > 0}\Delta_i^{1-2\alpha}\Big). This is the first tight gap-dependent regret bound for streaming MAB. Prior to our work, an O(i ⁣:Δ>0TlogTΔi)O\Big(\sum_{i\colon\Delta>0} \frac{\sqrt{T}\log T}{\Delta_i}\Big) upper bound for the special case of α=1\alpha=1 and m=O(1)m=O(1) was established by Agarwal, Khanna and Patil (COLT'22). In contrast, our results provide the correct order of regret as Θ(1mi ⁣:Δ>0TΔi)\Theta\Big(\frac{1}{\sqrt{m}}\sum_{i\colon\Delta>0}\frac{\sqrt{T}}{\Delta_i}\Big).

View on arXiv
@article{ye2025_2503.02428,
  title={ Tight Gap-Dependent Memory-Regret Trade-Off for Single-Pass Streaming Stochastic Multi-Armed Bandits },
  author={ Zichun Ye and Chihao Zhang and Jiahao Zhao },
  journal={arXiv preprint arXiv:2503.02428},
  year={ 2025 }
}
Comments on this paper