Consider the classical supervised learning problem: we are given data , , with a response and a covariates vector, and try to learn a model to predict future responses. Random features methods map the covariates vector to a point in a higher dimensional space , via a random featurization map . We study the use of random features methods in conjunction with ridge regression in the feature space . This can be viewed as a finite-dimensional approximation of kernel ridge regression (KRR), or as a stylized model for neural networks in the so called lazy training regime. We define a class of problems satisfying certain spectral conditions on the underlying kernels, and a hypercontractivity assumption on the associated eigenfunctions. These conditions are verified by classical high-dimensional examples. Under these conditions, we prove a sharp characterization of the error of random features ridge regression. In particular, we address two fundamental questions: ~What is the generalization error of KRR? ~How big should be for the random features approximation to achieve the same error as KRR? In this setting, we prove that KRR is well approximated by a projection onto the top eigenfunctions of the kernel, where depends on the sample size . We show that the test error of random features ridge regression is dominated by its approximation error and is larger than the error of KRR as long as for some . We characterize this gap. For , random features achieve the same error as the corresponding KRR, and further increasing does not lead to a significant change in test error.
View on arXiv