10
0

Understanding Best Subset Selection: A Tale of Two C(omplex)ities

Abstract

We consider the problem of best subset selection (BSS) under high-dimensional sparse linear regression model. Recently, Guo et al. (2020) showed that the model selection performance of BSS depends on a certain identifiability margin, a measure that captures the model discriminative power of BSS under a general correlation structure that is robust to the design dependence, unlike its computational surrogates such as LASSO, SCAD, MCP, etc. Expanding on this, we further broaden the theoretical understanding of best subset selection in this paper and show that the complexities of the residualized signals, the portion of the signals orthogonal to the true active features, and spurious projections, describing the projection operators associated with the irrelevant features, also play fundamental roles in characterizing the margin condition for model consistency of BSS. In particular, we establish both necessary and sufficient margin conditions depending only on the identifiability margin and the two complexity measures. We also partially extend our sufficiency result to the case of high-dimensional sparse generalized linear models (GLMs).

View on arXiv
@article{roy2025_2301.06259,
  title={ Understanding Best Subset Selection: A Tale of Two C(omplex)ities },
  author={ Saptarshi Roy and Ambuj Tewari and Ziwei Zhu },
  journal={arXiv preprint arXiv:2301.06259},
  year={ 2025 }
}
Comments on this paper