As they play an increasingly important role in determining access to credit, credit scoring models are under growing scrutiny from banking supervisors and internal model validators. These authorities need to monitor the model performance and identify its key drivers. To facilitate this, we introduce the XPER methodology to decompose a performance metric (e.g., AUC, ) into specific contributions associated with the various features of a forecasting model. XPER is theoretically grounded on Shapley values and is both model-agnostic and performance metric-agnostic. Furthermore, it can be implemented either at the model level or at the individual level. Using a novel dataset of car loans, we decompose the AUC of a machine-learning model trained to forecast the default probability of loan applicants. We show that a small number of features can explain a surprisingly large part of the model performance. Notably, the features that contribute the most to the predictive performance of the model may not be the ones that contribute the most to individual forecasts (SHAP). Finally, we show how XPER can be used to deal with heterogeneity issues and improve performance.
View on arXiv