ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.01548
19
38

Online learning over a finite action set with limited switching

5 March 2018
Jason M. Altschuler
Kunal Talwar
ArXivPDFHTML
Abstract

This paper studies the value of switching actions in the Prediction From Experts (PFE) problem and Adversarial Multi-Armed Bandits (MAB) problem. First, we revisit the well-studied and practically motivated setting of PFE with switching costs. Many algorithms are known to achieve the minimax optimal order of O(Tlog⁡n)O(\sqrt{T \log n})O(Tlogn​) in expectation for both regret and number of switches, where TTT is the number of iterations and nnn the number of actions. However, no high probability (h.p.) guarantees are known. Our main technical contribution is the first algorithms which with h.p. achieve this optimal order for both regret and switches. This settles an open problem of [Devroye et al., 2015], and directly implies the first h.p. guarantees for several problems of interest. Next, to investigate the value of switching actions at a more granular level, we introduce the setting of switching budgets, in which algorithms are limited to S≤TS \leq TS≤T switches between actions. This entails a limited number of free switches, in contrast to the unlimited number of expensive switches in the switching cost setting. Using the above result and several reductions, we unify previous work and completely characterize the complexity of this switching budget setting up to small polylogarithmic factors: for both PFE and MAB, for all switching budgets S≤TS \leq TS≤T, and for both expectation and h.p. guarantees. For PFE, we show the optimal rate is Θ~(Tlog⁡n)\tilde{\Theta}(\sqrt{T\log n})Θ~(Tlogn​) for S=Ω(Tlog⁡n)S = \Omega(\sqrt{T\log n})S=Ω(Tlogn​), and min⁡(Θ~(Tlog⁡nS),T)\min(\tilde{\Theta}(\tfrac{T\log n}{S}), T)min(Θ~(STlogn​),T) for S=O(Tlog⁡n)S = O(\sqrt{T \log n})S=O(Tlogn​). Interestingly, the bandit setting does not exhibit such a phase transition; instead we show the minimax rate decays steadily as min⁡(Θ~(TnS),T)\min(\tilde{\Theta}(\tfrac{T\sqrt{n}}{\sqrt{S}}), T)min(Θ~(S​Tn​​),T) for all ranges of S≤TS \leq TS≤T. These results recover and generalize the known minimax rates for the (arbitrary) switching cost setting.

View on arXiv
Comments on this paper