Leveraging Predictive Equivalence in Decision Trees

17 June 2025

Main:8 Pages

25 Figures

Bibliography:4 Pages

5 Tables

Appendix:24 Pages

Abstract

Decision trees are widely used for interpretable machine learning due to their clearly structured reasoning process. However, this structure belies a challenge we refer to as predictive equivalence: a given tree's decision boundary can be represented by many different decision trees. The presence of models with identical decision boundaries but different evaluation processes makes model selection challenging. The models will have different variable importance and behave differently in the presence of missing values, but most optimization procedures will arbitrarily choose one such model to return. We present a boolean logical representation of decision trees that does not exhibit predictive equivalence and is faithful to the underlying decision boundary. We apply our representation to several downstream machine learning tasks. Using our representation, we show that decision trees are surprisingly robust to test-time missingness of feature values; we address predictive equivalence's impact on quantifying variable importance; and we present an algorithm to optimize the cost of reaching predictions.

View on arXiv

@article{mctavish2025_2506.14143,
  title={ Leveraging Predictive Equivalence in Decision Trees },
  author={ Hayden McTavish and Zachery Boner and Jon Donnelly and Margo Seltzer and Cynthia Rudin },
  journal={arXiv preprint arXiv:2506.14143},
  year={ 2025 }
}

Comments on this paper