ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.19752
25
3

Understanding Memory-Regret Trade-Off for Streaming Stochastic Multi-Armed Bandits

30 May 2024
Yuchen He
Zichun Ye
Chihao Zhang
ArXivPDFHTML
Abstract

We study the stochastic multi-armed bandit problem in the PPP-pass streaming model. In this problem, the nnn arms are present in a stream and at most m<nm<nm<n arms and their statistics can be stored in the memory. We give a complete characterization of the optimal regret in terms of m,nm, nm,n and PPP. Specifically, we design an algorithm with O~((n−m)1+2P−22P+1−1n2−2P+12P+1−1T2P2P+1−1)\tilde O\left((n-m)^{1+\frac{2^{P}-2}{2^{P+1}-1}} n^{\frac{2-2^{P+1}}{2^{P+1}-1}} T^{\frac{2^P}{2^{P+1}-1}}\right)O~((n−m)1+2P+1−12P−2​n2P+1−12−2P+1​T2P+1−12P​) regret and complement it with an Ω~((n−m)1+2P−22P+1−1n2−2P+12P+1−1T2P2P+1−1)\tilde \Omega\left((n-m)^{1+\frac{2^{P}-2}{2^{P+1}-1}} n^{\frac{2-2^{P+1}}{2^{P+1}-1}} T^{\frac{2^P}{2^{P+1}-1}}\right)Ω~((n−m)1+2P+1−12P−2​n2P+1−12−2P+1​T2P+1−12P​) lower bound when the number of rounds TTT is sufficiently large. Our results are tight up to a logarithmic factor in nnn and PPP.

View on arXiv
Comments on this paper