21
56

Dimensionality-Dependent Generalization Bounds for kk-Dimensional Coding Schemes

Abstract

The kk-dimensional coding schemes refer to a collection of methods that attempt to represent data using a set of representative kk-dimensional vectors, and include non-negative matrix factorization, dictionary learning, sparse coding, kk-means clustering and vector quantization as special cases. Previous generalization bounds for the reconstruction error of the kk-dimensional coding schemes are mainly dimensionality independent. A major advantage of these bounds is that they can be used to analyze the generalization error when data is mapped into an infinite- or high-dimensional feature space. However, many applications use finite-dimensional data features. Can we obtain dimensionality-dependent generalization bounds for kk-dimensional coding schemes that are tighter than dimensionality-independent bounds when data is in a finite-dimensional feature space? The answer is positive. In this paper, we address this problem and derive a dimensionality-dependent generalization bound for kk-dimensional coding schemes by bounding the covering number of the loss function class induced by the reconstruction error. The bound is of order O((mkln(mkn)/n)λn)\mathcal{O}\left(\left(mk\ln(mkn)/n\right)^{\lambda_n}\right), where mm is the dimension of features, kk is the number of the columns in the linear implementation of coding schemes, nn is the size of sample, λn>0.5\lambda_n>0.5 when nn is finite and λn=0.5\lambda_n=0.5 when nn is infinite. We show that our bound can be tighter than previous results, because it avoids inducing the worst-case upper bound on kk of the loss function and converges faster. The proposed generalization bound is also applied to some specific coding schemes to demonstrate that the dimensionality-dependent bound is an indispensable complement to these dimensionality-independent generalization bounds.

View on arXiv
Comments on this paper