97
3

Learning with Spectral Kernels and Heavy-Tailed Data

Abstract

Heavy-tailed data, e.g., graphs in which the degree sequence decays according to a power law, are ubiquitous in applications. In many of those applications, spectral kernels, e.g., Laplacian Eigenmaps and Diffusion Maps, are commonly-used analytic tools. We establish learnability results applicable in both settings. Our first result is an exact learning bound for learning a classification hyperplane when the components of the feature vector decay according to a power law. Thus, although the distribution of data is infinite dimensional and unbounded, a nearly optimal linear classification hyperplane is learnable due to the polynomial decay in the probability that the ithi^{th} feature of a random data point is non-zero. Our second result is a ``gap-tolerant'' learning bound for learning a nearly-optimal Δ\Delta-margin classification hyperplane when the kernel is constructed according to the Diffusion Maps procedure. Each proof bounds the annealed entropy and thus makes important use of distribution-dependent information. The proof of our first result is direct, while the proof of our second result uses as an intermediate step a commonly-accepted but not yet rigorously-proved bound for the VC dimension of gap-tolerant classifiers. We offer a rigorous proof of this result for the usual case where the margin is measured in the 2\ell_2 norm, and we prove a generalization of this result to the case where the data need not have compact support and where the margin may be measured with respect to the more general p\ell_p norm.

View on arXiv
Comments on this paper