ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.09007
25
35

Stochastic Multi-armed Bandits in Constant Space

25 December 2017
David Liau
Eric Price
Zhao Song
Ger Yang
ArXivPDFHTML
Abstract

We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all KKK arms. We give an algorithm using O(1)O(1)O(1) words of space with regret \[ \sum_{i=1}^{K}\frac{1}{\Delta_i}\log \frac{\Delta_i}{\Delta}\log T \] where Δi\Delta_iΔi​ is the gap between the best arm and arm iii and Δ\DeltaΔ is the gap between the best and the second-best arms. If the rewards are bounded away from 000 and 111, this is within an O(log⁡1/Δ)O(\log 1/\Delta)O(log1/Δ) factor of the optimum regret possible without space constraints.

View on arXiv
Comments on this paper