Global Minimizers of $\ell^p$ -Regularized Objectives Yield the Sparsest ReLU Neural Networks

27 May 2025

Main:9 Pages

16 Figures

Bibliography:3 Pages

Appendix:17 Pages

Abstract

Overparameterized neural networks can interpolate a given dataset in many different ways, prompting the fundamental question: which among these solutions should we prefer, and what explicit regularization strategies will provably yield these solutions? This paper addresses the challenge of finding the sparsest interpolating ReLU network -- i.e., the network with the fewest nonzero parameters or neurons -- a goal with wide-ranging implications for efficiency, generalization, interpretability, theory, and model compression. Unlike post hoc pruning approaches, we propose a continuous, almost-everywhere differentiable training objective whose global minima are guaranteed to correspond to the sparsest single-hidden-layer ReLU networks that fit the data. This result marks a conceptual advance: it recasts the combinatorial problem of sparse interpolation as a smooth optimization task, potentially enabling the use of gradient-based training methods. Our objective is based on minimizing $\ell^p$ quasinorms of the weights for $0 < p < 1$ , a classical sparsity-promoting strategy in finite-dimensional settings. However, applying these ideas to neural networks presents new challenges: the function class is infinite-dimensional, and the weights are learned using a highly nonconvex objective. We prove that, under our formulation, global minimizers correspond exactly to sparsest solutions. Our work lays a foundation for understanding when and how continuous sparsity-inducing objectives can be leveraged to recover sparse networks through training.

View on arXiv

@article{nakhleh2025_2505.21791,
  title={ Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks },
  author={ Julia Nakhleh and Robert D. Nowak },
  journal={arXiv preprint arXiv:2505.21791},
  year={ 2025 }
}

Comments on this paper

Global Minimizers of ℓp\ell^pℓp-Regularized Objectives Yield the Sparsest ReLU Neural Networks

Global Minimizers of $\ell^p$ -Regularized Objectives Yield the Sparsest ReLU Neural Networks