Diagnosing AI Explanation Methods with Folk Concepts of Behavior

Conference on Fairness, Accountability and Transparency (FAccT), 2022

27 January 2022

Abstract

When explaining AI behavior to humans, how does a human explainee comprehend the communicated information, and does it match what the explanation attempted to communicate? When can we say that an explanation is explaining something? We aim to provide an answer by leveraging theory of mind literature about the folk concepts that humans use to understand behavior. We establish a framework of social attribution by the human explainee, which describes the function of explanations: the information that humans comprehend from them. Specifically, effective explanations should produce coherent mental models (communicate information which generalizes to other contrast cases), complete (communicate an explicit causal narrative of a contrast case, representation causes, affected representation, and external causes), and interactive (surface and resolve contradictions to the generalization property through interrogation). We demonstrate that many XAI mechanisms can be mapped to folk concepts of behavior. This allows us to uncover their failure modes that prevent current methods from explaining effectively, and what is necessary to enable coherent explanations.

View on arXiv

Comments on this paper