Analysis of -Laplacian Regularization in Semi-Supervised Learning

We investigate a family of regression problems in a semi-supervised setting. The task is to assign real-valued labels to a set of sample points, provided a small training subset of labeled points. A goal of semi-supervised learning is to take advantage of the (geometric) structure provided by the large number of unlabeled data when assigning labels. We consider a random geometric graph, with connection radius , to represent the geometry of the data set. We study objective functions which reward the regularity of the estimator function and impose or reward the agreement with the training data. In particular we consider discrete -Laplacian regularization. We investigate asymptotic behavior in the limit where the number of unlabeled points increases while the number of training points remains fixed. We uncover a delicate interplay between the regularizing nature of the functionals considered and the nonlocality inherent to the graph constructions. We rigorously obtain almost optimal ranges on the scaling of for the asymptotic consistency to hold. We discover that for standard approaches used thus far there is a restrictive upper bound on how quickly must converge to zero as . Furthermore we introduce a new model which overcomes this restriction. It is as simple as the standard models, but converges as soon as as .
View on arXiv