ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.08257
23
3

Gradient Descent for Low-Rank Functions

16 June 2022
Romain Cosson
Ali Jadbabaie
A. Makur
Amirhossein Reisizadeh
Devavrat Shah
ArXivPDFHTML
Abstract

Several recent empirical studies demonstrate that important machine learning tasks, e.g., training deep neural networks, exhibit low-rank structure, where the loss function varies significantly in only a few directions of the input space. In this paper, we leverage such low-rank structure to reduce the high computational cost of canonical gradient-based methods such as gradient descent (GD). Our proposed \emph{Low-Rank Gradient Descent} (LRGD) algorithm finds an ϵ\epsilonϵ-approximate stationary point of a ppp-dimensional function by first identifying r≤pr \leq pr≤p significant directions, and then estimating the true ppp-dimensional gradient at every iteration by computing directional derivatives only along those rrr directions. We establish that the "directional oracle complexities" of LRGD for strongly convex and non-convex objective functions are O(rlog⁡(1/ϵ)+rp)\mathcal{O}(r \log(1/\epsilon) + rp)O(rlog(1/ϵ)+rp) and O(r/ϵ2+rp)\mathcal{O}(r/\epsilon^2 + rp)O(r/ϵ2+rp), respectively. When r≪pr \ll pr≪p, these complexities are smaller than the known complexities of O(plog⁡(1/ϵ))\mathcal{O}(p \log(1/\epsilon))O(plog(1/ϵ)) and O(p/ϵ2)\mathcal{O}(p/\epsilon^2)O(p/ϵ2) of {\gd} in the strongly convex and non-convex settings, respectively. Thus, LRGD significantly reduces the computational cost of gradient-based methods for sufficiently low-rank functions. In the course of our analysis, we also formally define and characterize the classes of exact and approximately low-rank functions.

View on arXiv
Comments on this paper