81
59
v1v2 (latest)

Tracking the Best Expert in Non-stationary Stochastic Environments

Abstract

We study the dynamic regret of multi-armed bandit and experts problem in non-stationary stochastic environments. We introduce a new parameter Λ\Lambda, which measures the total statistical variance of the loss distributions over TT rounds of the process, and study how this amount affects the regret. We investigate the interaction between Λ\Lambda and Γ\Gamma, which counts the number of times the distributions change, as well as Λ\Lambda and VV, which measures how far the distributions deviates over time. One striking result we find is that even when Γ\Gamma, VV, and Λ\Lambda are all restricted to constant, the regret lower bound in the bandit setting still grows with TT. The other highlight is that in the full-information setting, a constant regret becomes achievable with constant Γ\Gamma and Λ\Lambda, as it can be made independent of TT, while with constant VV and Λ\Lambda, the regret still has a T1/3T^{1/3} dependency. We not only propose algorithms with upper bound guarantee, but prove their matching lower bounds as well.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.