ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.12975
16
4

Rotting Infinitely Many-armed Bandits

31 January 2022
Jung-hun Kim
Milan Vojnović
Se-Young Yun
ArXivPDFHTML
Abstract

We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate ϱ=o(1)\varrho=o(1)ϱ=o(1). We show that this learning problem has an Ω(max⁡{ϱ1/3T,T})\Omega(\max\{\varrho^{1/3}T,\sqrt{T}\})Ω(max{ϱ1/3T,T​}) worst-case regret lower bound where TTT is the horizon time. We show that a matching upper bound O~(max⁡{ϱ1/3T,T})\tilde{O}(\max\{\varrho^{1/3}T,\sqrt{T}\})O~(max{ϱ1/3T,T​}), up to a poly-logarithmic factor, can be achieved by an algorithm that uses a UCB index for each arm and a threshold value to decide whether to continue pulling an arm or remove the arm from further consideration, when the algorithm knows the value of the maximum rotting rate ϱ\varrhoϱ. We also show that an O~(max⁡{ϱ1/3T,T3/4})\tilde{O}(\max\{\varrho^{1/3}T,T^{3/4}\})O~(max{ϱ1/3T,T3/4}) regret upper bound can be achieved by an algorithm that does not know the value of ϱ\varrhoϱ, by using an adaptive UCB index along with an adaptive threshold value.

View on arXiv
Comments on this paper