ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1708.01799
85
133
v1v2v3v4 (latest)

Efficient Contextual Bandits in Non-stationary Worlds

5 August 2017
Haipeng Luo
Chen-Yu Wei
Alekh Agarwal
John Langford
ArXiv (abs)PDFHTML
Abstract

Most contextual bandit algorithms minimize regret against the best fixed policy, a questionable benchmark for non-stationary environments that are ubiquitous in applications. In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i.i.d. problems with sophisticated statistical tests so as to dynamically adapt to a change in distribution. We analyze various standard notions of regret suited to non-stationary environments for these algorithms, including interval regret, switching regret, and dynamic regret. When competing with the best policy at each time, one of our algorithms achieves regret O(ST)\mathcal{O}(\sqrt{ST})O(ST​) if there are TTT rounds with SSS stationary periods, or more generally O(Δ1/3T2/3)\mathcal{O}(\Delta^{1/3}T^{2/3})O(Δ1/3T2/3) where Δ\DeltaΔ is some non-stationarity measure. These results almost match the optimal guarantees achieved by an inefficient baseline that is a variant of the classic Exp4 algorithm. The dynamic regret result is also the first one for efficient and fully adversarial contextual bandit. Furthermore, while the results above require tuning a parameter based on the unknown quantity SSS or Δ\DeltaΔ, we also develop a parameter free algorithm achieving regret min⁡{S1/4T3/4,Δ1/5T4/5}\min\{S^{1/4}T^{3/4}, \Delta^{1/5}T^{4/5}\}min{S1/4T3/4,Δ1/5T4/5}. This improves and generalizes the best existing result Δ0.18T0.82\Delta^{0.18}T^{0.82}Δ0.18T0.82 by Karnin and Anava (2016) which only holds for the two-armed bandit problem.

View on arXiv
Comments on this paper