30
0

A hierarchical decomposition for explaining ML performance discrepancies

Abstract

Machine learning (ML) algorithms can often differ in performance across domains. Understanding why\textit{why} their performance differs is crucial for determining what types of interventions (e.g., algorithmic or operational) are most effective at closing the performance gaps. Existing methods focus on aggregate decompositions\textit{aggregate decompositions} of the total performance gap into the impact of a shift in the distribution of features p(X)p(X) versus the impact of a shift in the conditional distribution of the outcome p(YX)p(Y|X); however, such coarse explanations offer only a few options for how one can close the performance gap. Detailed variable-level decompositions\textit{Detailed variable-level decompositions} that quantify the importance of each variable to each term in the aggregate decomposition can provide a much deeper understanding and suggest much more targeted interventions. However, existing methods assume knowledge of the full causal graph or make strong parametric assumptions. We introduce a nonparametric hierarchical framework that provides both aggregate and detailed decompositions for explaining why the performance of an ML algorithm differs across domains, without requiring causal knowledge. We derive debiased, computationally-efficient estimators, and statistical inference procedures for asymptotically valid confidence intervals.

View on arXiv
Comments on this paper