Learning with Spectral Kernels and Heavy-Tailed Data

24 June 2009

Abstract

Heavy-tailed data, e.g., graphs in which the degree sequence decays according to a power law, are ubiquitous in applications. In many of those applications, spectral kernels, e.g., Laplacian Eigenmaps and Diffusion Maps, are commonly-used analytic tools. We establish learnability results applicable in both settings. Our first result is an exact learning bound for learning a classification hyperplane when the components of the feature vector decay according to a power law. Thus, although the distribution of data is infinite dimensional and unbounded, a nearly optimal linear classification hyperplane is learnable due to the polynomial decay in the probability that the $i^{th}$ feature of a random data point is non-zero. Our second result is a ``gap-tolerant'' learning bound for learning a nearly-optimal $\Delta$ -margin classification hyperplane when the kernel is constructed according to the Diffusion Maps procedure. Each proof bounds the annealed entropy and thus makes important use of distribution-dependent information. The proof of our first result is direct, while the proof of our second result uses as an intermediate step a commonly-accepted but not yet rigorously-proved bound for the VC dimension of gap-tolerant classifiers. We offer a rigorous proof of this result for the usual case where the margin is measured in the $\ell_2$ norm, and we prove a generalization of this result to the case where the data need not have compact support and where the margin may be measured with respect to the more general $\ell_p$ norm.

View on arXiv

Comments on this paper