ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.03145
22
3

The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits

6 September 2023
Sepehr Assadi
Chen Wang
ArXivPDFHTML
Abstract

We give a near-optimal sample-pass trade-off for pure exploration in multi-armed bandits (MABs) via multi-pass streaming algorithms: any streaming algorithm with sublinear memory that uses the optimal sample complexity of O(nΔ2)O(\frac{n}{\Delta^2})O(Δ2n​) requires Ω(log⁡(1/Δ)log⁡log⁡(1/Δ))\Omega(\frac{\log{(1/\Delta)}}{\log\log{(1/\Delta)}})Ω(loglog(1/Δ)log(1/Δ)​) passes. Here, nnn is the number of arms and Δ\DeltaΔ is the reward gap between the best and the second-best arms. Our result matches the O(log⁡(1Δ))O(\log(\frac{1}{\Delta}))O(log(Δ1​))-pass algorithm of Jin et al. [ICML'21] (up to lower order terms) that only uses O(1)O(1)O(1) memory and answers an open question posed by Assadi and Wang [STOC'20].

View on arXiv
Comments on this paper