ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2509.23898
60
0
v1v2 (latest)

Differentiable Sparsity via DDD-Gating: Simple and Versatile Structured Penalization

28 September 2025
Chris Kolb
Laetitia Frost
B. Bischl
David Rügamer
ArXiv (abs)PDFHTML
Main:9 Pages
25 Figures
Bibliography:5 Pages
5 Tables
Appendix:23 Pages
Abstract

Structured sparsity regularization offers a principled way to compact neural networks, but its non-differentiability breaks compatibility with conventional stochastic gradient descent and requires either specialized optimizers or additional post-hoc pruning without formal guarantees. In this work, we propose DDD-Gating, a fully differentiable structured overparameterization that splits each group of weights into a primary weight vector and multiple scalar gating factors. We prove that any local minimum under DDD-Gating is also a local minimum using non-smooth structured L2,2/DL_{2,2/D}L2,2/D​ penalization, and further show that the DDD-Gating objective converges at least exponentially fast to the L2,2/DL_{2,2/D}L2,2/D​-regularized loss in the gradient flow limit. Together, our results show that DDD-Gating is theoretically equivalent to solving the original group sparsity problem, yet induces distinct learning dynamics that evolve from a non-sparse regime into sparse optimization. We validate our theory across vision, language, and tabular tasks, where DDD-Gating consistently delivers strong performance-sparsity tradeoffs and outperforms both direct optimization of structured penalties and conventional pruning baselines.

View on arXiv
Comments on this paper