Online Learning in Dynamically Changing Environments

We study the problem of online learning and online regret minimization when samples are drawn from a general unknown non-stationary process. We introduce the concept of a dynamic changing process with cost , where the conditional marginals of the process can vary arbitrarily, but that the number of different conditional marginals is bounded by over rounds. For such processes we prove a tight (upto factor) bound for the expected worst case regret of any finite VC-dimensional class under absolute loss (i.e., the expected miss-classification loss). We then improve this bound for general mixable losses, by establishing a tight (up to factor) regret bound . We extend these results to general smooth adversary processes with unknown reference measure by showing a sub-linear regret bound for -dimensional threshold functions under a general bounded convex loss. Our results can be viewed as a first step towards regret analysis with non-stationary samples in the distribution blind (universal) regime. This also brings a new viewpoint that shifts the study of complexity of the hypothesis classes to the study of the complexity of processes generating data.
View on arXiv