Post hoc Explanations may be Ineffective for Detecting Unknown Spurious Correlation

9 December 2022

Papers citing "Post hoc Explanations may be Ineffective for Detecting Unknown Spurious Correlation"

50 / 58 papers shown

Title
In defence of post-hoc explanations in medical AI Joshua Hatherley Lauritz Munch Jens Christian Bjerring 64 0 0 29 Apr 2025
Probabilistic Stability Guarantees for Feature Attributions Helen Jin Anton Xue Weiqiu You Surbhi Goel Eric Wong 92 0 0 18 Apr 2025
B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable Shreyash Arya Sukrut Rao Moritz Bohle Bernt Schiele 143 3 0 28 Jan 2025
Multi-Scale Grouped Prototypes for Interpretable Semantic Segmentation Hugo Porta Emanuele Dalsasso Diego Marcos D. Tuia 223 0 0 14 Sep 2024
"Will You Find These Shortcuts?" A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification Jasmijn Bastings Sebastian Ebert Polina Zablotskaia Anders Sandholm Katja Filippova 130 78 0 14 Nov 2021
Finding and Fixing Spurious Patterns with Explanations Gregory Plumb Marco Tulio Ribeiro Ameet Talwalkar 69 42 0 03 Jun 2021
The effectiveness of feature attribution methods and its correlation with automatic evaluation scores Giang Nguyen Daeyoung Kim Anh Totti Nguyen FAtt 105 89 0 31 May 2021
Sanity Simulations for Saliency Methods Joon Sik Kim Gregory Plumb Ameet Talwalkar FAtt 60 17 0 13 May 2021
Do Feature Attribution Methods Correctly Attribute Features? Yilun Zhou Serena Booth Marco Tulio Ribeiro J. Shah FAtt XAI 71 133 0 27 Apr 2021
Do Input Gradients Highlight Discriminative Features? Harshay Shah Prateek Jain Praneeth Netrapalli AAML FAtt 57 59 0 25 Feb 2021
FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging Han Guo Nazneen Rajani Peter Hase Joey Tianyi Zhou Caiming Xiong TDI 79 112 0 31 Dec 2020
Removing Spurious Features can Hurt Accuracy and Affect Groups Disproportionately Fereshte Khani Percy Liang FaML 49 65 0 07 Dec 2020
Debugging Tests for Model Explanations Julius Adebayo M. Muelly Ilaria Liccardi Been Kim FAtt 64 181 0 10 Nov 2020
Understanding the Failure Modes of Out-of-Distribution Generalization Vaishnavh Nagarajan Anders Andreassen Behnam Neyshabur OOD OODD 48 177 0 29 Oct 2020
Now You See Me (CME): Concept-based Model Extraction Dmitry Kazhdan B. Dimanov M. Jamnik Pietro Lio Adrian Weller 46 75 0 25 Oct 2020
How Useful Are the Machine-Generated Interpretations to General Users? A Human Evaluation on Guessing the Incorrectly Predicted Labels Hua Shen Ting-Hao 'Kenneth' Huang FAtt HAI 54 56 0 26 Aug 2020
Are Visual Explanations Useful? A Case Study in Model-in-the-Loop Prediction Eric Chu D. Roy Jacob Andreas FAtt LRM 49 71 0 23 Jul 2020
Debiasing Concept-based Explanations with Causal Analysis M. T. Bahadori David Heckerman FAtt CML 55 39 0 22 Jul 2020
Fairwashing Explanations with Off-Manifold Detergent Christopher J. Anders Plamen Pasliev Ann-Kathrin Dombrowski K. Müller Pan Kessel FAtt FaML 42 97 0 20 Jul 2020
Concept Bottleneck Models Pang Wei Koh Thao Nguyen Y. S. Tang Stephen Mussmann Emma Pierson Been Kim Percy Liang 94 818 0 09 Jul 2020
Influence Functions in Deep Learning Are Fragile S. Basu Phillip E. Pope Soheil Feizi TDI 99 230 0 25 Jun 2020
Noise or Signal: The Role of Image Backgrounds in Object Recognition Kai Y. Xiao Logan Engstrom Andrew Ilyas Aleksander Madry 131 387 0 17 Jun 2020
Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions Xiaochuang Han Byron C. Wallace Yulia Tsvetkov MILM FAtt AAML TDI 55 171 0 14 May 2020
An Investigation of Why Overparameterization Exacerbates Spurious Correlations Shiori Sagawa Aditi Raghunathan Pang Wei Koh Percy Liang 182 379 0 09 May 2020
Shortcut Learning in Deep Neural Networks Robert Geirhos J. Jacobsen Claudio Michaelis R. Zemel Wieland Brendel Matthias Bethge Felix Wichmann 198 2,044 0 16 Apr 2020
Estimating Training Data Influence by Tracing Gradient Descent G. Pruthi Frederick Liu Mukund Sundararajan Satyen Kale TDI 68 404 0 19 Feb 2020
Concept Whitening for Interpretable Image Recognition Zhi Chen Yijie Bei Cynthia Rudin FAtt 63 320 0 05 Feb 2020
Evaluating Saliency Map Explanations for Convolutional Neural Networks: A User Study Ahmed Alqaraawi M. Schuessler Philipp Weiß Enrico Costanza N. Bianchi-Berthouze AAML FAtt XAI 61 200 0 03 Feb 2020
Sanity Checks for Saliency Metrics Richard J. Tomsett Daniel Harborne Supriyo Chakraborty Prudhvi K. Gurram Alun D. Preece XAI 67 169 0 29 Nov 2019
Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization Shiori Sagawa Pang Wei Koh Tatsunori B. Hashimoto Percy Liang OOD 85 1,236 0 20 Nov 2019
"How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations Himabindu Lakkaraju Osbert Bastani 56 254 0 15 Nov 2019
Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods Dylan Slack Sophie Hilgard Emily Jia Sameer Singh Himabindu Lakkaraju FAtt AAML MLAU 66 817 0 06 Nov 2019
On Completeness-aware Concept-Based Explanations in Deep Neural Networks Chih-Kuan Yeh Been Kim Sercan O. Arik Chun-Liang Li Tomas Pfister Pradeep Ravikumar FAtt 210 304 0 17 Oct 2019
Interpretations are useful: penalizing explanations to align neural networks with prior knowledge Laura Rieger Chandan Singh W. James Murdoch Bin Yu FAtt 78 214 0 30 Sep 2019
Improving performance of deep learning models with axiomatic attribution priors and expected gradients G. Erion Joseph D. Janizek Pascal Sturmfels Scott M. Lundberg Su-In Lee OOD BDL FAtt 52 81 0 25 Jun 2019
Explanations can be manipulated and geometry is to blame Ann-Kathrin Dombrowski Maximilian Alber Christopher J. Anders M. Ackermann K. Müller Pan Kessel AAML FAtt 78 330 0 19 Jun 2019
Unmasking Clever Hans Predictors and Assessing What Machines Really Learn Sebastian Lapuschkin S. Wäldchen Alexander Binder G. Montavon Wojciech Samek K. Müller 84 1,009 0 26 Feb 2019
Transfusion: Understanding Transfer Learning for Medical Imaging M. Raghu Chiyuan Zhang Jon M. Kleinberg Samy Bengio MedIm 75 982 0 14 Feb 2019
Fooling Neural Network Interpretations via Adversarial Model Manipulation Juyeon Heo Sunghwan Joo Taesup Moon AAML FAtt 88 202 0 06 Feb 2019
ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness Robert Geirhos Patricia Rubisch Claudio Michaelis Matthias Bethge Felix Wichmann Wieland Brendel 96 2,662 0 29 Nov 2018
Representer Point Selection for Explaining Deep Neural Networks Chih-Kuan Yeh Joon Sik Kim Ian En-Hsu Yen Pradeep Ravikumar TDI 64 251 0 23 Nov 2018
Deep Learning Predicts Hip Fracture using Confounding Patient and Healthcare Variables Giovanni Sutanto J. Zech Luke Oakden-Rayner Yevgen Chebotar Manway Liu William Gale M. McConnell Ankur Handa Thomas M. Snyder Dieter Fox AI4CE OOD 74 244 0 08 Nov 2018
Sanity Checks for Saliency Maps Julius Adebayo Justin Gilmer M. Muelly Ian Goodfellow Moritz Hardt Been Kim FAtt AAML XAI 123 1,963 0 08 Oct 2018
A Benchmark for Interpretability Methods in Deep Neural Networks Sara Hooker D. Erhan Pieter-Jan Kindermans Been Kim FAtt UQCV 98 681 0 28 Jun 2018
A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations Weili Nie Yang Zhang Ankit B. Patel FAtt 120 151 0 18 May 2018
Manipulating and Measuring Model Interpretability Forough Poursabzi-Sangdeh D. Goldstein Jake M. Hofman Jennifer Wortman Vaughan Hanna M. Wallach 84 697 0 21 Feb 2018
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) Been Kim Martin Wattenberg Justin Gilmer Carrie J. Cai James Wexler F. Viégas Rory Sayres FAtt 199 1,837 0 30 Nov 2017
Interpretation of Neural Networks is Fragile Amirata Ghorbani Abubakar Abid James Zou FAtt AAML 124 865 0 29 Oct 2017
SmoothGrad: removing noise by adding noise D. Smilkov Nikhil Thorat Been Kim F. Viégas Martin Wattenberg FAtt ODL 199 2,221 0 12 Jun 2017
Network Dissection: Quantifying Interpretability of Deep Visual Representations David Bau Bolei Zhou A. Khosla A. Oliva Antonio Torralba MILM FAtt 132 1,514 1 19 Apr 2017