The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations

1 June 2021

Papers citing "The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations"

50 / 64 papers shown

Title
Explanations as Bias Detectors: A Critical Study of Local Post-hoc XAI Methods for Fairness Exploration Vasiliki Papanikou Danae Pla Karidi E. Pitoura Emmanouil Panagiotou Eirini Ntoutsi 33 0 0 01 May 2025
Probabilistic Stability Guarantees for Feature Attributions Helen Jin Anton Xue Weiqiu You Surbhi Goel Eric Wong 27 0 0 18 Apr 2025
PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition Jongseo Lee Wooil Lee Gyeong-Moon Park Seong Tae Kim Jinwoo Choi 35 0 0 17 Apr 2025
Towards Spatially-Aware and Optimally Faithful Concept-Based Explanations Shubham Kumar Dwip Dalal Narendra Ahuja 26 0 0 15 Apr 2025
Noiser: Bounded Input Perturbations for Attributing Large Language Models Mohammad Reza Ghasemi Madani Aryo Pradipta Gema Gabriele Sarti Yu Zhao Pasquale Minervini Andrea Passerini AAML 35 0 0 03 Apr 2025
Self-Explaining Neural Networks for Business Process Monitoring Shahaf Bassan Shlomit Gur Sergey Zeltyn Konstantinos Mavrogiorgos Ron Eliav Dimosthenis Kyriazis 49 0 0 23 Mar 2025
Are formal and functional linguistic mechanisms dissociated in language models? Michael Hanna Sandro Pezzelle Yonatan Belinkov 50 0 0 14 Mar 2025
A Tale of Two Imperatives: Privacy and Explainability Supriya Manna Niladri Sett 142 0 0 30 Dec 2024
From Flexibility to Manipulation: The Slippery Slope of XAI Evaluation Kristoffer Wickstrøm Marina M.-C. Höhne Anna Hedström AAML 84 2 0 07 Dec 2024
Benchmarking XAI Explanations with Human-Aligned Evaluations Rémi Kazmierczak Steve Azzolin Eloise Berthier Anna Hedström Patricia Delhomme ... Goran Frehse Massimiliano Mancini Baptiste Caramiaux Andrea Passerini Gianni Franchi 28 1 0 04 Nov 2024
Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models Wei Jie Yeo Ranjan Satapathy Erik Cambria 34 0 0 18 Oct 2024
F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI Xu Zheng Farhad Shirani Zhuomin Chen Chaohao Lin Wei Cheng Wenbo Guo Dongsheng Luo AAML 38 0 0 03 Oct 2024
Faithfulness and the Notion of Adversarial Sensitivity in NLP Explanations Supriya Manna Niladri Sett AAML 29 2 0 26 Sep 2024
Optimal ablation for interpretability Maximilian Li Lucas Janson FAtt 49 2 0 16 Sep 2024
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models Sepehr Kamahi Yadollah Yaghoobzadeh 53 0 0 21 Aug 2024
Normalized AOPC: Fixing Misleading Faithfulness Metrics for Feature Attribution Explainability Joakim Edin Andreas Geert Motzfeldt Casper L. Christensen Tuukka Ruotsalo Lars Maaløe Maria Maistro 40 3 0 15 Aug 2024
Hard to Explain: On the Computational Hardness of In-Distribution Model Interpretation Guy Amir Shahaf Bassan Guy Katz 44 2 0 07 Aug 2024
BEExAI: Benchmark to Evaluate Explainable AI Samuel Sithakoul Sara Meftah Clément Feutry 47 8 0 29 Jul 2024
Benchmarking the Attribution Quality of Vision Models Robin Hesse Simone Schaub-Meyer Stefan Roth FAtt 34 3 0 16 Jul 2024
Noise-Free Explanation for Driving Action Prediction Hongbo Zhu Theodor Wulff R. S. Maharjan Jinpei Han Angelo Cangelosi AAML FAtt 38 0 0 08 Jul 2024
Efficient and Accurate Explanation Estimation with Distribution Compression Hubert Baniecki Giuseppe Casalicchio Bernd Bischl Przemyslaw Biecek FAtt 48 3 0 26 Jun 2024
How to use and interpret activation patching Stefan Heimersheim Neel Nanda 38 37 0 23 Apr 2024
CA-Stream: Attention-based pooling for interpretable image recognition Felipe Torres Hanwei Zhang R. Sicre Stéphane Ayache Yannis Avrithis 52 0 0 23 Apr 2024
On the Faithfulness of Vision Transformer Explanations Junyi Wu Weitai Kang Hao Tang Yuan Hong Yan Yan 27 6 0 01 Apr 2024
Attention Meets Post-hoc Interpretability: A Mathematical Perspective Gianluigi Lopardo F. Precioso Damien Garreau 16 4 0 05 Feb 2024
ReAGent: A Model-agnostic Feature Attribution Method for Generative Language Models Zhixue Zhao Boxuan Shan 28 5 0 01 Feb 2024
Explaining Time Series via Contrastive and Locally Sparse Perturbations Zichuan Liu Yingying Zhang Tianchun Wang Zefan Wang Dongsheng Luo ... Min Wu Yi Wang Chunlin Chen Lunting Fan Qingsong Wen 30 10 0 16 Jan 2024
Decoupling Pixel Flipping and Occlusion Strategy for Consistent XAI Benchmarks Stefan Blücher Johanna Vielhaben Nils Strodthoff AAML 66 20 0 12 Jan 2024
The Compute Divide in Machine Learning: A Threat to Academic Contribution and Scrutiny? T. Besiroglu S. Bergerson Amelia Michael Lennart Heim Xueyun Luo Neil Thompson 30 11 0 04 Jan 2024
Faithful and Robust Local Interpretability for Textual Predictions Gianluigi Lopardo F. Precioso Damien Garreau OOD 26 4 0 30 Oct 2023
Faithfulness Measurable Masked Language Models Andreas Madsen Siva Reddy Sarath Chandar 40 3 0 11 Oct 2023
Evaluating Explanation Methods for Vision-and-Language Navigation Guanqi Chen Lei Yang Guanhua Chen Jia Pan XAI 23 0 0 10 Oct 2023
Towards Robust Fidelity for Evaluating Explainability of Graph Neural Networks Xu Zheng Farhad Shirani Tianchun Wang Wei Cheng Zhuomin Chen Haifeng Chen Hua Wei Dongsheng Luo 38 11 0 03 Oct 2023
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods Fred Zhang Neel Nanda LLMSV 36 100 0 27 Sep 2023
Goodhart's Law Applies to NLP's Explanation Benchmarks Jennifer Hsia Danish Pruthi Aarti Singh Zachary Chase Lipton 30 6 0 28 Aug 2023
FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods Robin Hesse Simone Schaub-Meyer Stefan Roth AAML 37 33 0 11 Aug 2023
Decoding Layer Saliency in Language Transformers Elizabeth M. Hou Greg Castañón MILM 6 0 0 09 Aug 2023
The Co-12 Recipe for Evaluating Interpretable Part-Prototype Image Classifiers Meike Nauta Christin Seifert 34 11 0 26 Jul 2023
Encoding Time-Series Explanations through Self-Supervised Model Behavior Consistency Owen Queen Thomas Hartvigsen Teddy Koker Huan He Theodoros Tsiligkaridis Marinka Zitnik AI4TS 37 17 0 03 Jun 2023
Efficient Shapley Values Estimation by Amortization for Text Classification Chenghao Yang Fan Yin He He Kai-Wei Chang Xiaofei Ma Bing Xiang FAtt VLM 35 4 0 31 May 2023
On the Impact of Knowledge Distillation for Model Interpretability Hyeongrok Han Siwon Kim Hyun-Soo Choi Sungroh Yoon 24 4 0 25 May 2023
Understanding Post-hoc Explainers: The Case of Anchors Gianluigi Lopardo F. Precioso Damien Garreau FAtt 22 2 0 15 Mar 2023
Relational Local Explanations V. Borisov Gjergji Kasneci FAtt 22 0 0 23 Dec 2022
CRAFT: Concept Recursive Activation FacTorization for Explainability Thomas Fel Agustin Picard Louis Bethune Thibaut Boissin David Vigouroux Julien Colin Rémi Cadène Thomas Serre 19 103 0 17 Nov 2022
What Makes a Good Explanation?: A Harmonized View of Properties of Explanations Zixi Chen Varshini Subhash Marton Havasi Weiwei Pan Finale Doshi-Velez XAI FAtt 36 18 0 10 Nov 2022
Learning Unsupervised Hierarchies of Audio Concepts Darius Afchar Romain Hennequin Vincent Guigue 43 2 0 21 Jul 2022
BASED-XAI: Breaking Ablation Studies Down for Explainable Artificial Intelligence Isha Hameed Samuel Sharpe Daniel Barcklow Justin Au-yeung Sahil Verma Jocelyn Huang Brian Barr C. Bayan Bruss 38 15 0 12 Jul 2022
TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations Dylan Slack Satyapriya Krishna Himabindu Lakkaraju Sameer Singh 29 74 0 08 Jul 2022
VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives Zhuofan Ying Peter Hase Joey Tianyi Zhou LRM 28 13 0 22 Jun 2022
Explanation-based Counterfactual Retraining(XCR): A Calibration Method for Black-box Models Liu Zhendong Wenyu Jiang Yan Zhang Chongjun Wang CML 11 0 0 22 Jun 2022