ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.12366
90
9

Smooth Non-Stationary Bandits

29 January 2023
S. Jia
Qian Xie
Nathan Kallus
P. Frazier
ArXivPDFHTML
Abstract

In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time, where they guarantee Θ~(T2/3)\tilde \Theta(T^{2/3})Θ~(T2/3) regret. However, in practice environments are often changing {\bf smoothly}, so such algorithms may incur higher-than-necessary regret in these settings and do not leverage information on the rate of change. We study a non-stationary two-armed bandits problem where we assume that an arm's mean reward is a β\betaβ-H\"older function over (normalized) time, meaning it is (β−1)(\beta-1)(β−1)-times Lipschitz-continuously differentiable. We show the first separation between the smooth and non-smooth regimes by presenting a policy with O~(T3/5)\tilde O(T^{3/5})O~(T3/5) regret for β=2\beta=2β=2. We complement this result by an \Omg(T(β+1)/(2β+1))\Omg(T^{(\beta+1)/(2\beta+1)})\Omg(T(β+1)/(2β+1)) lower bound for any integer β≥1\beta\ge 1β≥1, which matches our upper bound for β=2\beta=2β=2.

View on arXiv
Comments on this paper