ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.10027
22
3

The Real Price of Bandit Information in Multiclass Classification

16 May 2024
Liad Erez
Alon Cohen
Tomer Koren
Yishay Mansour
Shay Moran
ArXivPDFHTML
Abstract

We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of KKK possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels KKK, and whether TTT-step regret bounds in this setting can be improved beyond the KT\smash{\sqrt{KT}}KT​ dependence exhibited by existing algorithms. Our main contribution is in showing that the minimax regret of bandit multiclass is in fact more nuanced, and is of the form Θ~(min⁡{∣H∣+T,KTlog⁡∣H∣})\smash{\widetilde{\Theta}\left(\min \left\{|H| + \sqrt{T}, \sqrt{KT \log |H|} \right\} \right) }Θ(min{∣H∣+T​,KTlog∣H∣​}), where HHH is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret O~(∣H∣+T)\smash{\widetilde{O}(|H|+\sqrt{T})}O(∣H∣+T​), improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes.

View on arXiv
Comments on this paper