The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?

12 October 2020

Papers citing "The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?"

50 / 93 papers shown

Title
Unveiling Knowledge Utilization Mechanisms in LLM-based Retrieval-Augmented Generation Yuhao Wang Ruiyang Ren Yucheng Wang Wayne Xin Zhao Jing Liu Hua Wu Haifeng Wang 2 0 0 17 May 2025
On Explaining (Large) Language Models For Code Using Global Code-Based Explanations David Nader-Palacio Dipin Khati Daniel Rodríguez-Cárdenas Alejandro Velasco Denys Poshyvanyk LRM 47 0 0 21 Mar 2025
Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based Explanation Duc Hau Nguyen Cyrielle Mallart Guillaume Gravier Pascale Sébillot 60 0 0 22 Jan 2025
Mechanistically Interpreting a Transformer-based 2-SAT Solver: An Axiomatic Approach Nils Palumbo Ravi Mangal Zifan Wang Saranya Vijayakumar Corina S. Pasareanu Somesh Jha 41 1 0 18 Jul 2024
A look under the hood of the Interactive Deep Learning Enterprise (No-IDLE) Daniel Sonntag Michael Barz Thiago S. Gouvêa VLM 52 4 0 27 Jun 2024
Interpretability Needs a New Paradigm Andreas Madsen Himabindu Lakkaraju Siva Reddy Sarath Chandar 39 4 0 08 May 2024
Unraveling the Dilemma of AI Errors: Exploring the Effectiveness of Human and Machine Explanations for Large Language Models Marvin Pafla Kate Larson Mark Hancock 43 6 0 11 Apr 2024
LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models Igor Tufanov Karen Hambardzumyan Javier Ferrando Elena Voita KELM 33 6 0 10 Apr 2024
On the Faithfulness of Vision Transformer Explanations Junyi Wu Weitai Kang Hao Tang Yuan Hong Yan Yan 27 6 0 01 Apr 2024
Towards Explainability in Legal Outcome Prediction Models Josef Valvoda Ryan Cotterell ELM AILaw 55 4 0 25 Mar 2024
Comparing Explanation Faithfulness between Multilingual and Monolingual Fine-tuned Language Models Zhixue Zhao Nikolaos Aletras 37 3 0 19 Mar 2024
Detecting Hallucination and Coverage Errors in Retrieval Augmented Generation for Controversial Topics Tyler A. Chang Katrin Tomanek Jessica Hoffmann Nithum Thain Erin van Liemt Kathleen Meier-Hellstern Lucas Dixon 41 7 0 13 Mar 2024
Information Flow Routes: Automatically Interpreting Language Models at Scale Javier Ferrando Elena Voita 54 35 0 27 Feb 2024
Attention Meets Post-hoc Interpretability: A Mathematical Perspective Gianluigi Lopardo F. Precioso Damien Garreau 16 4 0 05 Feb 2024
Approximate Attributions for Off-the-Shelf Siamese Transformers Lucas Moller Dmitry Nikolaev Sebastian Padó 29 4 0 05 Feb 2024
ReAGent: A Model-agnostic Feature Attribution Method for Generative Language Models Zhixue Zhao Boxuan Shan 26 5 0 01 Feb 2024
XAI for In-hospital Mortality Prediction via Multimodal ICU Data Xingqiao Li Jindong Gu Zhiyong Wang Yancheng Yuan Bo Du Fengxiang He 35 2 0 29 Dec 2023
Attribution and Alignment: Effects of Local Context Repetition on Utterance Production and Comprehension in Dialogue Aron Molnar Jaap Jumelet Mario Giulianelli Arabella J. Sinclair 33 2 0 21 Nov 2023
Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups Weiqiu You Helen Qu Marco Gatti Bhuvnesh Jain Eric Wong FAtt FaML 45 4 0 25 Oct 2023
REFER: An End-to-end Rationale Extraction Framework for Explanation Regularization Mohammad Reza Ghasemi Madani Pasquale Minervini 35 4 0 22 Oct 2023
An Attribution Method for Siamese Encoders Lucas Moller Dmitry Nikolaev Sebastian Padó 19 4 0 09 Oct 2023
Quantifying the Plausibility of Context Reliance in Neural Machine Translation Gabriele Sarti Grzegorz Chrupala Malvina Nissim Arianna Bisazza 34 5 0 02 Oct 2023
Attention Sorting Combats Recency Bias In Long Context Language Models A. Peysakhovich Adam Lerer LRM RALM 39 42 0 28 Sep 2023
Exploring Different Levels of Supervision for Detecting and Localizing Solar Panels on Remote Sensing Imagery Maarten Burger R. Wijnhoven Shaodi You 14 1 0 19 Sep 2023
Unsupervised Text Style Transfer with Deep Generative Models Zhongtao Jiang Yuanzhe Zhang Yiming Ju Kang Liu 27 0 0 31 Aug 2023
Decoding Layer Saliency in Language Transformers Elizabeth M. Hou Greg Castañón MILM 6 0 0 09 Aug 2023
ALens: An Adaptive Domain-Oriented Abstract Writing Training Tool for Novice Researchers Chen Cheng Ziang Li Zhenhui Peng Quan Li 24 0 0 08 Aug 2023
Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction Haotian Chen Bingsheng Chen Xiangdong Zhou 45 6 0 20 Jun 2023
B-cos Alignment for Inherently Interpretable CNNs and Vision Transformers Moritz D Boehle Navdeeppal Singh Mario Fritz Bernt Schiele 59 27 0 19 Jun 2023
Using Sequences of Life-events to Predict Human Lives Germans Savcisens Tina Eliassi-Rad L. K. Hansen L. Mortensen Lau Lilleholt Anna Rogers Ingo Zettler Sune Lehmann AI4TS 39 36 0 05 Jun 2023
DecompX: Explaining Transformers Decisions by Propagating Token Decomposition Ali Modarressi Mohsen Fayyaz Ehsan Aghazadeh Yadollah Yaghoobzadeh Mohammad Taher Pilehvar 25 26 0 05 Jun 2023
AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap Q. V. Liao J. Vaughan 42 158 0 02 Jun 2023
HalOmi: A Manually Annotated Benchmark for Multilingual Hallucination and Omission Detection in Machine Translation David Dale Elena Voita Janice Lam Prangthip Hansanti C. Ropers Elahe Kalbassi Cynthia Gao Loïc Barrault Marta R. Costa-jussá HILM 32 27 0 19 May 2023
Incorporating Attribution Importance for Improving Faithfulness Metrics Zhixue Zhao Nikolaos Aletras 21 13 0 17 May 2023
AD-KD: Attribution-Driven Knowledge Distillation for Language Model Compression Siyue Wu Hongzhan Chen Xiaojun Quan Qifan Wang Rui-cang Wang VLM 14 18 0 17 May 2023
ConvXAI: Delivering Heterogeneous AI Explanations via Conversations to Support Human-AI Scientific Writing Hua Shen Huang Chieh-Yang Tongshuang Wu Ting-Hao 'Kenneth' Huang 23 37 0 16 May 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 191 266 0 28 Apr 2023
Evaluating self-attention interpretability through human-grounded experimental protocol Milan Bhan Nina Achache Victor Legrand A. Blangero Nicolas Chesneau 26 9 0 27 Mar 2023
Holistically Explainable Vision Transformers Moritz D Boehle Mario Fritz Bernt Schiele ViT 38 9 0 20 Jan 2023
Opti-CAM: Optimizing saliency maps for interpretability Hanwei Zhang Felipe Torres R. Sicre Yannis Avrithis Stéphane Ayache 36 22 0 17 Jan 2023
DExT: Detector Explanation Toolkit Deepan Padmanabhan Paul G. Plöger Octavio Arriaga Matias Valdenegro-Toro 33 2 0 21 Dec 2022
Human-Guided Fair Classification for Natural Language Processing Florian E.Dorner Momchil Peychev Nikola Konstantinov Naman Goel Elliott Ash Martin Vechev FaML 19 3 0 20 Dec 2022
Azimuth: Systematic Error Analysis for Text Classification Gabrielle Gauthier Melançon Orlando Marquez Ayala Lindsay D. Brin Chris Tyler Frederic Branchaud-Charron Joseph Marinier Karine Grande Dieu-Thu Le 16 3 0 16 Dec 2022
Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency Methods Josip Jukić Martin Tutek Jan Snajder FAtt 21 0 0 15 Nov 2022
Understanding Text Classification Data and Models Using Aggregated Input Salience Sebastian Ebert Alice Shoshana Jakobovits Katja Filippova FAtt 22 3 0 10 Nov 2022
ViT-CX: Causal Explanation of Vision Transformers Weiyan Xie Xiao-hui Li Caleb Chen Cao Nevin L.Zhang ViT 29 17 0 06 Nov 2022
Deconfounding Legal Judgment Prediction for European Court of Human Rights Cases Towards Better Alignment with Experts Santosh T.Y.S.S Shanshan Xu O. Ichim Matthias Grabmair 28 26 0 25 Oct 2022
Using Interventions to Improve Out-of-Distribution Generalization of Text-Matching Recommendation Systems Parikshit Bansal Yashoteja Prabhu Emre Kıcıman Amit Sharma CML OOD 33 0 0 07 Oct 2022
polyBERT: A chemical language model to enable fully machine-driven ultrafast polymer informatics Christopher Kuenneth R. Ramprasad 34 101 0 29 Sep 2022
Towards Faithful Model Explanation in NLP: A Survey Qing Lyu Marianna Apidianaki Chris Callison-Burch XAI 114 107 0 22 Sep 2022