ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.05324
23
81

A Simple Approach for Non-stationary Linear Bandits

9 March 2021
Peng Zhao
Lijun Zhang
Yuan Jiang
Zhi-Hua Zhou
ArXivPDFHTML
Abstract

This paper investigates the problem of non-stationary linear bandits, where the unknown regression parameter is evolving over time. Existing studies develop various algorithms and show that they enjoy an O~(T2/3PT1/3)\widetilde{\mathcal{O}}(T^{2/3}P_T^{1/3})O(T2/3PT1/3​) dynamic regret, where TTT is the time horizon and PTP_TPT​ is the path-length that measures the fluctuation of the evolving unknown parameter. In this paper, we discover that a serious technical flaw makes their results ungrounded, and then present a fix, which gives an O~(T3/4PT1/4)\widetilde{\mathcal{O}}(T^{3/4}P_T^{1/4})O(T3/4PT1/4​) dynamic regret without modifying original algorithms. Furthermore, we demonstrate that instead of using sophisticated mechanisms, such as sliding window or weighted penalty, a simple restarted strategy is sufficient to attain the same regret guarantee. Specifically, we design an UCB-type algorithm to balance exploitation and exploration, and restart it periodically to handle the drift of unknown parameters. Our approach enjoys an O~(T3/4PT1/4)\widetilde{\mathcal{O}}(T^{3/4}P_T^{1/4})O(T3/4PT1/4​) dynamic regret. Note that to achieve this bound, the algorithm requires an oracle knowledge of the path-length PTP_TPT​. Combining the bandits-over-bandits mechanism by treating our algorithm as the base learner, we can further achieve the same regret bound in a parameter-free way. Empirical studies also validate the effectiveness of our approach.

View on arXiv
Comments on this paper