ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.12577
46
13
v1v2v3 (latest)

Minimax Regret for Cascading Bandits

23 March 2022
Daniel Vial
Sujay Sanghavi
Sanjay Shakkottai
R. Srikant
ArXiv (abs)PDFHTML
Abstract

Cascading bandits model the task of learning to rank KKK out of LLL items over nnn rounds of partial feedback. For this model, the minimax (i.e., gap-free) regret is poorly understood; in particular, the best known lower and upper bounds are Ω(nL/K)\Omega(\sqrt{nL/K})Ω(nL/K​) and O~(nLK)\tilde{O}(\sqrt{nLK})O~(nLK​), respectively. We improve the lower bound to Ω(nL)\Omega(\sqrt{nL})Ω(nL​) and show CascadeKL-UCB (which ranks items by their KL-UCB indices) attains it up to log terms. Surprisingly, we also show CascadeUCB1 (which ranks via UCB1) can suffer suboptimal Ω(nLK)\Omega(\sqrt{nLK})Ω(nLK​) regret. This sharply contrasts with standard LLL-armed bandits, where the corresponding algorithms both achieve the minimax regret nL\sqrt{nL}nL​ (up to log terms), and the main advantage of KL-UCB is only to improve constants in the gap-dependent bounds. In essence, this contrast occurs because Pinsker's inequality is tight for hard problems in the LLL-armed case but loose (by a factor of KKK) in the cascading case.

View on arXiv
Comments on this paper