ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.01375
40
57

Analysis of a Two-Layer Neural Network via Displacement Convexity

5 January 2019
Adel Javanmard
Marco Mondelli
Andrea Montanari
    MLT
ArXivPDFHTML
Abstract

Fitting a function by using linear combinations of a large number NNN of `simple' components is one of the most fruitful ideas in statistical learning. This idea lies at the core of a variety of methods, from two-layer neural networks to kernel regression, to boosting. In general, the resulting risk minimization problem is non-convex and is solved by gradient descent or its variants. Unfortunately, little is known about global convergence properties of these approaches. Here we consider the problem of learning a concave function fff on a compact convex domain Ω⊆Rd\Omega\subseteq {\mathbb R}^dΩ⊆Rd, using linear combinations of `bump-like' components (neurons). The parameters to be fitted are the centers of NNN bumps, and the resulting empirical risk minimization problem is highly non-convex. We prove that, in the limit in which the number of neurons diverges, the evolution of gradient descent converges to a Wasserstein gradient flow in the space of probability distributions over Ω\OmegaΩ. Further, when the bump width δ\deltaδ tends to 000, this gradient flow has a limit which is a viscous porous medium equation. Remarkably, the cost function optimized by this gradient flow exhibits a special property known as displacement convexity, which implies exponential convergence rates for N→∞N\to\inftyN→∞, δ→0\delta\to 0δ→0. Surprisingly, this asymptotic theory appears to capture well the behavior for moderate values of δ,N\delta, Nδ,N. Explaining this phenomenon, and understanding the dependence on δ,N\delta,Nδ,N in a quantitative manner remains an outstanding challenge.

View on arXiv
Comments on this paper