ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.08832
25
74

Universality of empirical risk minimization

17 February 2022
Andrea Montanari
Basil Saeed
    OOD
ArXivPDFHTML
Abstract

Consider supervised learning from i.i.d. samples {xi,yi}i≤n\{{\boldsymbol x}_i,y_i\}_{i\le n}{xi​,yi​}i≤n​ where xi∈Rp{\boldsymbol x}_i \in\mathbb{R}^pxi​∈Rp are feature vectors and y∈R{y} \in \mathbb{R}y∈R are labels. We study empirical risk minimization over a class of functions that are parameterized by k=O(1)\mathsf{k} = O(1)k=O(1) vectors θ1,...,θk∈Rp{\boldsymbol \theta}_1, . . . , {\boldsymbol \theta}_{\mathsf k} \in \mathbb{R}^pθ1​,...,θk​∈Rp , and prove universality results both for the training and test error. Namely, under the proportional asymptotics n,p→∞n,p\to\inftyn,p→∞, with n/p=Θ(1)n/p = \Theta(1)n/p=Θ(1), we prove that the training error depends on the random features distribution only through its covariance structure. Further, we prove that the minimum test error over near-empirical risk minimizers enjoys similar universality properties. In particular, the asymptotics of these quantities can be computed −-−to leading order−-− under a simpler model in which the feature vectors xi{\boldsymbol x}_ixi​ are replaced by Gaussian vectors gi{\boldsymbol g}_igi​ with the same covariance. Earlier universality results were limited to strongly convex learning procedures, or to feature vectors xi{\boldsymbol x}_ixi​ with independent entries. Our results do not make any of these assumptions. Our assumptions are general enough to include feature vectors xi{\boldsymbol x}_ixi​ that are produced by randomized featurization maps. In particular we explicitly check the assumptions for certain random features models (computing the output of a one-layer neural network with random weights) and neural tangent models (first-order Taylor approximation of two-layer networks).

View on arXiv
Comments on this paper