ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.08798
155
3
v1v2 (latest)

Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence

13 November 2024
Berfin Simsek
Amire Bendjeddou
Daniel Hsu
ArXiv (abs)PDFHTML
Abstract

This work focuses on the gradient flow dynamics of a neural network model that uses correlation loss to approximate a multi-index function on high-dimensional standard Gaussian data. Specifically, the multi-index function we consider is a sum of neurons f∗(x) ⁣= ⁣∑j=1k ⁣σ∗(vjTx)f^*(x) \!=\! \sum_{j=1}^k \! \sigma^*(v_j^T x)f∗(x)=∑j=1k​σ∗(vjT​x) where v1,…,vkv_1, \dots, v_kv1​,…,vk​ are unit vectors, and σ∗\sigma^*σ∗ lacks the first and second Hermite polynomials in its Hermite expansion. It is known that, for the single-index case (k ⁣= ⁣1k\!=\!1k=1), overcoming the search phase requires polynomial time complexity. We first generalize this result to multi-index functions characterized by vectors in arbitrary directions. After the search phase, it is not clear whether the network neurons converge to the index vectors, or get stuck at a sub-optimal solution. When the index vectors are orthogonal, we give a complete characterization of the fixed points and prove that neurons converge to the nearest index vectors. Therefore, using n ⁣≍ ⁣klog⁡kn \! \asymp \! k \log kn≍klogk neurons ensures finding the full set of index vectors with gradient flow with high probability over random initialization. When viTvj ⁣= ⁣β ⁣≥ ⁣0 v_i^T v_j \!=\! \beta \! \geq \! 0viT​vj​=β≥0 for all i≠ji \neq ji=j, we prove the existence of a sharp threshold βc ⁣= ⁣c/(c+k)\beta_c \!=\! c/(c+k)βc​=c/(c+k) at which the fixed point that computes the average of the index vectors transitions from a saddle point to a minimum. Numerical simulations show that using a correlation loss and a mild overparameterization suffices to learn all of the index vectors when they are nearly orthogonal, however, the correlation loss fails when the dot product between the index vectors exceeds a certain threshold.

View on arXiv
@article{şimşek2025_2411.08798,
  title={ Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence },
  author={ Berfin Şimşek and Amire Bendjeddou and Daniel Hsu },
  journal={arXiv preprint arXiv:2411.08798},
  year={ 2025 }
}
Comments on this paper