Revisiting the robustness of post-hoc interpretability methods

29 July 2024

Jiawen Wei

Hugues Turbé

G. Mengaldo

AAML

ArXiv (abs)PDF HTML Github (10281★)

Main:10 Pages

7 Figures

Bibliography:2 Pages

4 Tables

Appendix:5 Pages

Abstract

Post-hoc interpretability methods play a critical role in explainable artificial intelligence (XAI), as they pinpoint portions of data that a trained deep learning model deemed important to make a decision. However, different post-hoc interpretability methods often provide different results, casting doubts on their accuracy. For this reason, several evaluation strategies have been proposed to understand the accuracy of post-hoc interpretability. Many of these evaluation strategies provide a coarse-grained assessment -- i.e., they evaluate how the performance of the model degrades on average by corrupting different data points across multiple samples. While these strategies are effective in selecting the post-hoc interpretability method that is most reliable on average, they fail to provide a sample-level, also referred to as fine-grained, assessment. In other words, they do not measure the robustness of post-hoc interpretability methods. We propose an approach and two new metrics to provide a fine-grained assessment of post-hoc interpretability methods. We show that the robustness is generally linked to its coarse-grained performance.

View on arXiv

Comments on this paper