On Completeness-aware Concept-Based Explanations in Deep Neural Networks
- FAtt

Human explanations of high-level decisions are often expressed in terms of key concepts the decisions are based on. In this paper, we study such concept-based explainability for Deep Neural Networks (DNNs). First, we define the notion of completeness, which quantifies how sufficient is a particular set of concepts in explaining a model's prediction behavior. Next, we propose a concept discovery method that aims to infer a complete set of concepts that are additionally encouraged to be interpretable. Our concept discovery method aims to address the limitations of commonly-used methods such as PCA and TCAV. To define an importance score for each discovered concept, we adapt game-theoretic notions to aggregate over sets and propose \emph{ConceptSHAP}. On a Synthetic dataset with ground-truth concept explanations, on a real-world dataset, and with a user study, we validate the effectiveness of our framework in finding concepts that are both complete in explaining the decisions, and interpretable.
View on arXiv