ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1805.05071
18
32

KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints

14 May 2018
Aurélien Garivier
Hédi Hadiji
Pierre Menard
Gilles Stoltz
ArXivPDFHTML
Abstract

We consider KKK-armed stochastic bandits and consider cumulative regret bounds up to time TTT. We are interested in strategies achieving simultaneously a distribution-free regret bound of optimal order KT\sqrt{KT}KT​ and a distribution-dependent regret that is asymptotically optimal, that is, matching the κln⁡T\kappa\ln TκlnT lower bound by Lai and Robbins (1985) and Burnetas and Katehakis (1996), where κ\kappaκ is the optimal problem-dependent constant. This constant κ\kappaκ depends on the model D\mathcal{D}D considered (the family of possible distributions over the arms). M\énard and Garivier (2017) provided strategies achieving such a bi-optimality in the parametric case of models given by one-dimensional exponential families, while Lattimore (2016, 2018) did so for the family of (sub)Gaussian distributions with variance less than 111. We extend this result to the non-parametric case of all distributions over [0,1][0,1][0,1]. We do so by combining the MOSS strategy by Audibert and Bubeck (2009), which enjoys a distribution-free regret bound of optimal order KT\sqrt{KT}KT​, and the KL-UCB strategy by Capp\é et al. (2013), for which we provide in passing the first analysis of an optimal distribution-dependent κln⁡T\kappa\ln TκlnT regret bound in the model of all distributions over [0,1][0,1][0,1]. We were able to obtain this non-parametric bi-optimality result while working hard to streamline the proofs (of previously known regret bounds and thus of the new analyses carried out); a second merit of the present contribution is therefore to provide a review of proofs of classical regret bounds for index-based strategies for KKK-armed stochastic bandits.

View on arXiv
Comments on this paper