ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.10454
11
0

Restless dependent bandits with fading memory

25 June 2019
O. Zadorozhnyi
Gilles Blanchard
Alexandra Carpentier
ArXivPDFHTML
Abstract

We study the stochastic multi-armed bandit problem in the case when the arm samples are dependent over time and generated from so-called weak \cC\cC\cC-mixing processes. We establish a \cC−\cC-\cC−Mix Improved UCB agorithm and provide both problem-dependent and independent regret analysis in two different scenarios. In the first, so-called fast-mixing scenario, we show that pseudo-regret enjoys the same upper bound (up to a factor) as for i.i.d. observations; whereas in the second, slow mixing scenario, we discover a surprising effect, that the regret upper bound is similar to the independent case, with an incremental {\em additive} term which does not depend on the number of arms. The analysis of slow mixing scenario is supported with a minmax lower bound, which (up to a log⁡(T)\log(T)log(T) factor) matches the obtained upper bound.

View on arXiv
Comments on this paper