ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.11453
20
0

Greedy Pruning with Group Lasso Provably Generalizes for Matrix Sensing

20 March 2023
Nived Rajaraman
Devvrit
Aryan Mokhtari
Kannan Ramchandran
ArXivPDFHTML
Abstract

Pruning schemes have been widely used in practice to reduce the complexity of trained models with a massive number of parameters. In fact, several practical studies have shown that if a pruned model is fine-tuned with some gradient-based updates it generalizes well to new samples. Although the above pipeline, which we refer to as pruning + fine-tuning, has been extremely successful in lowering the complexity of trained models, there is very little known about the theory behind this success. In this paper, we address this issue by investigating the pruning + fine-tuning framework on the overparameterized matrix sensing problem with the ground truth U⋆∈Rd×rU_\star \in \mathbb{R}^{d \times r}U⋆​∈Rd×r and the overparameterized model U∈Rd×kU \in \mathbb{R}^{d \times k}U∈Rd×k with k≫rk \gg rk≫r. We study the approximate local minima of the mean square error, augmented with a smooth version of a group Lasso regularizer, ∑i=1k∥Uei∥2\sum_{i=1}^k \| U e_i \|_2∑i=1k​∥Uei​∥2​. In particular, we provably show that pruning all the columns below a certain explicit ℓ2\ell_2ℓ2​-norm threshold results in a solution UpruneU_{\text{prune}}Uprune​ which has the minimum number of columns rrr, yet close to the ground truth in training loss. Moreover, in the subsequent fine-tuning phase, gradient descent initialized at UpruneU_{\text{prune}}Uprune​ converges at a linear rate to its limit. While our analysis provides insights into the role of regularization in pruning, we also show that running gradient descent in the absence of regularization results in models which {are not suitable for greedy pruning}, i.e., many columns could have their ℓ2\ell_2ℓ2​ norm comparable to that of the maximum. To the best of our knowledge, our results provide the first rigorous insights on why greedy pruning + fine-tuning leads to smaller models which also generalize well.

View on arXiv
Comments on this paper