32
0

Explaining Representation by Mutual Information

Abstract

As interpretability gains attention in machine learning, there is a growing need for reliable models that fully explain representation content. We propose a mutual information (MI)-based method that decomposes neural network representations into three exhaustive components: total mutual information, decision-related information, and redundant information. This theoretically complete framework captures the entire input-representation relationship, surpassing partial explanations like those from Grad-CAM. Using two lightweight modules integrated into architectures such as CNNs and Transformers,we estimate these components and demonstrate their interpretive power through visualizations on ResNet and prototype network applied to image classification and few-shot learning tasks. Our approach is distinguished by three key features: 1. Rooted in mutual information theory, it delivers a thorough and theoretically grounded interpretation, surpassing the scope of existing interpretability methods. 2. Unlike conventional methods that focus on explaining decisions, our approach centers on interpreting representations. 3. It seamlessly integrates into pre-existing network architectures, requiring only fine-tuning of the inserted modules.

View on arXiv
@article{gu2025_2103.15114,
  title={ Explaining Representation by Mutual Information },
  author={ Lifeng Gu },
  journal={arXiv preprint arXiv:2103.15114},
  year={ 2025 }
}
Comments on this paper