Uniform Generalization, Concentration, and Adaptive Learning

22 August 2016

Abstract

One of the fundamental goals in any learning algorithm is to minimize its risk for overfitting. Mathematically, this implies that the learning algorithm enjoys a small generalization risk, which is defined either in expectation or in probability. Both types of generalization are commonly used in the literature. For instance, generalization in expectation has been used to analyze algorithms, such as ridge regression and SGD, whereas generalization in probability is used in the VC theory and the PAC-Bayesian framework, among others. Recently, however, a third notion of generalization has been studied, called uniform generalization, which requires that the generalization risk vanishes uniformly in expectation across all bounded parametric losses. It has been shown that uniform generalization is, in fact, equivalent to an algorithmic stability constraint, and that it recovers classical results in learning theory. However, the relationship between uniform generalization and concentration remained unknown. In this paper, we answer this question by proving that, while a generalization in expectation does not imply a generalization in probability, a uniform generalization in expectation does imply concentration. We establish a chain rule for uniform generalization and use it to derive a tight deviation bound. The chain rule also reveals that learning algorithms, which satisfy uniform generalization, are amenable to adaptive composition; thus improving upon earlier results that proposed stronger conditions, such as differential privacy, sample compression, or typical stability.

View on arXiv

Comments on this paper