Supersparse Linear Integer Models for Interpretable Classification

27 June 2013

Abstract

Scoring systems are classification models that make predictions using a sparse linear combination of variables with integer coefficients. Such systems are frequently used in medicine because they are interpretable; that is, they only require users to add, subtract and multiply a few meaningful numbers in order to make a prediction. In this work we introduce Supersparse Linear Integer Models (SLIM) as a tool for creating highly interpretable scoring systems. SLIM is based on a discrete optimization problem, which can be solved using mixed integer programming or tabu search techniques. SLIM's optimization problem uses both an L0 norm to encourage sparsity, and an L1 norm to encourage small coefficients among equally sparse solutions. SLIM can also be made to handle imbalanced data, and can incorporate many different types of constraints on the coefficients in order to produce interpretable predictive models.

View on arXiv

Comments on this paper