6
3

Improved error rates for sparse (group) learning with Lipschitz loss functions

Abstract

We study a family of sparse estimators defined as minimizers of some empirical Lipschitz loss function -- which include the hinge loss, the logistic loss and the quantile regression loss -- with a convex, sparse or group-sparse regularization. In particular, we consider the L1 norm on the coefficients, its sorted Slope version, and the Group L1-L2 extension. We propose a new theoretical framework that uses common assumptions in the literature to simultaneously derive new high-dimensional L2 estimation upper bounds for all three regularization schemes. %, and to improve over existing results. For L1 and Slope regularizations, our bounds scale as (k/n)log(p/k)(k^*/n) \log(p/k^*) -- n×pn\times p is the size of the design matrix and kk^* the dimension of the theoretical loss minimizer \Bβ\B{\beta}^* -- and match the optimal minimax rate achieved for the least-squares case. For Group L1-L2 regularization, our bounds scale as (s/n)log(G/s)+m/n(s^*/n) \log\left( G / s^* \right) + m^* / n -- GG is the total number of groups and mm^* the number of coefficients in the ss^* groups which contain \Bβ\B{\beta}^* -- and improve over the least-squares case. We show that, when the signal is strongly group-sparse, Group L1-L2 is superior to L1 and Slope. In addition, we adapt our approach to the sub-Gaussian linear regression framework and reach the optimal minimax rate for Lasso, and an improved rate for Group-Lasso. Finally, we release an accelerated proximal algorithm that computes the nine main convex estimators of interest when the number of variables is of the order of 100,000s100,000s.

View on arXiv
Comments on this paper