Submodularity in Statistics: Comparing the Success of Model Selection Methods

We demonstrate the usefulness of submodularity in statistics as a characterization of the difficulty of the \emph{search} problem of feature selection. The search problem is the ability of a procedure to identify an informative set of features as opposed to the performance of the optimal set of features. Submodularity arises naturally in this setting due to its connection to combinatorial optimization. In statistics, submodularity isolates cases in which collinearity makes the choice of model features difficult from those in which this task is routine. Researchers often report the signal-to-noise ratio to measure the difficulty of simulated data examples. A measure of submodularity should also be provided as it characterizes an independent component difficulty. Furthermore, it is closely related to other statistical assumptions used in the development of the Lasso, Dantzig selector, and sure information screening.
View on arXiv