ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.10588
14
110

Generalization error of random features and kernel methods: hypercontractivity and kernel matrix concentration

26 January 2021
Song Mei
Theodor Misiakiewicz
Andrea Montanari
ArXivPDFHTML
Abstract

Consider the classical supervised learning problem: we are given data (yi,xi)(y_i,{\boldsymbol x}_i)(yi​,xi​), i≤ni\le ni≤n, with yiy_iyi​ a response and xi∈X{\boldsymbol x}_i\in {\mathcal X}xi​∈X a covariates vector, and try to learn a model f:X→Rf:{\mathcal X}\to{\mathbb R}f:X→R to predict future responses. Random features methods map the covariates vector xi{\boldsymbol x}_ixi​ to a point ϕ(xi){\boldsymbol \phi}({\boldsymbol x}_i)ϕ(xi​) in a higher dimensional space RN{\mathbb R}^NRN, via a random featurization map ϕ{\boldsymbol \phi}ϕ. We study the use of random features methods in conjunction with ridge regression in the feature space RN{\mathbb R}^NRN. This can be viewed as a finite-dimensional approximation of kernel ridge regression (KRR), or as a stylized model for neural networks in the so called lazy training regime. We define a class of problems satisfying certain spectral conditions on the underlying kernels, and a hypercontractivity assumption on the associated eigenfunctions. These conditions are verified by classical high-dimensional examples. Under these conditions, we prove a sharp characterization of the error of random features ridge regression. In particular, we address two fundamental questions: (1)(1)(1)~What is the generalization error of KRR? (2)(2)(2)~How big NNN should be for the random features approximation to achieve the same error as KRR? In this setting, we prove that KRR is well approximated by a projection onto the top ℓ\ellℓ eigenfunctions of the kernel, where ℓ\ellℓ depends on the sample size nnn. We show that the test error of random features ridge regression is dominated by its approximation error and is larger than the error of KRR as long as N≤n1−δN\le n^{1-\delta}N≤n1−δ for some δ>0\delta>0δ>0. We characterize this gap. For N≥n1+δN\ge n^{1+\delta}N≥n1+δ, random features achieve the same error as the corresponding KRR, and further increasing NNN does not lead to a significant change in test error.

View on arXiv
Comments on this paper