Supersparse Linear Integer Models for Interpretable Classification

27 June 2013

Abstract

Scoring systems are classification models that make predictions using a sparse linear combination of variables with integer coefficients. Such systems are frequently used because they are interpretable; that is, they only require users to add, subtract and multiply a few meaningful numbers to generate a prediction. In this work, we introduce Supersparse Linear Integer Models (SLIM) as a tool for creating highly interpretable scoring systems. SLIM is formulated as a discrete optimization problem, whose objective minimizes the misclassification rate to encourage accuracy, while regularizing the L0-norm to encourage sparsity, and the L1-norm to encourage small coefficients among equally sparse solutions. SLIM can be adapted to handle imbalanced datasets, and can incorporate additional constraints to enhance the interpretability of scoring systems. We provide demonstrations to highlight the interpretability of SLIM's scoring systems, and present experimental results to show that SLIM's scoring systems are both accurate and sparse in comparison to state-of-the-art classification models.

View on arXiv

Comments on this paper