50
1

Variance-Dependent Regret Bounds for Non-stationary Linear Bandits

Abstract

We investigate the non-stationary stochastic linear bandit problem where the reward distribution evolves each round. Existing algorithms characterize the non-stationarity by the total variation budget BKB_K, which is the summation of the change of the consecutive feature vectors of the linear bandits over KK rounds. However, such a quantity only measures the non-stationarity with respect to the expectation of the reward distribution, which makes existing algorithms sub-optimal under the general non-stationary distribution setting. In this work, we propose algorithms that utilize the variance of the reward distribution as well as the BKB_K, and show that they can achieve tighter regret upper bounds. Specifically, we introduce two novel algorithms: Restarted WeightedOFUL+\text{OFUL}^+ and Restarted SAVE+\text{SAVE}^+. These algorithms address cases where the variance information of the rewards is known and unknown, respectively. Notably, when the total variance VKV_K is much smaller than KK, our algorithms outperform previous state-of-the-art results on non-stationary stochastic linear bandits under different settings. Experimental evaluations further validate the superior performance of our proposed algorithms over existing works.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.