ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1209.2693
61
117

Regret Bounds for Restless Markov Bandits

12 September 2012
R. Ortner
D. Ryabko
P. Auer
Rémi Munos
ArXivPDFHTML
Abstract

We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm that after TTT steps achieves O~(T)\tilde{O}(\sqrt{T})O~(T​) regret with respect to the best policy that knows the distributions of all arms. No assumptions on the Markov chains are made except that they are irreducible. In addition, we show that index-based policies are necessarily suboptimal for the considered problem.

View on arXiv
Comments on this paper