Learning to Deceive with Attention-Based Explanations

17 September 2019

Graham Neubig

Papers citing "Learning to Deceive with Attention-Based Explanations"

50 / 50 papers shown

Title
Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based Explanation Duc Hau Nguyen Cyrielle Mallart Guillaume Gravier Pascale Sébillot 68 0 0 22 Jan 2025
Explanation Regularisation through the Lens of Attributions Pedro Ferreira Wilker Aziz Ivan Titov 51 1 0 23 Jul 2024
Why bother with geometry? On the relevance of linear decompositions of Transformer embeddings Timothee Mickus Raúl Vázquez 27 2 0 10 Oct 2023
Evaluating Explanation Methods for Vision-and-Language Navigation Guanqi Chen Lei Yang Guanhua Chen Jia Pan XAI 25 0 0 10 Oct 2023
Explaining How Transformers Use Context to Build Predictions Javier Ferrando Gerard I. Gállego Ioannis Tsiamas Marta R. Costa-jussá 37 32 0 21 May 2023
Faithful Chain-of-Thought Reasoning Qing Lyu Shreya Havaldar Adam Stein Li Zhang D. Rao Eric Wong Marianna Apidianaki Chris Callison-Burch ReLM LRM 46 209 0 31 Jan 2023
Tensions Between the Proxies of Human Values in AI Teresa Datta D. Nissani Max Cembalest Akash Khanna Haley Massa John P. Dickerson 39 2 0 14 Dec 2022
MEGAN: Multi-Explanation Graph Attention Network Jonas Teufel Luca Torresi Patrick Reiser Pascal Friederich 28 8 0 23 Nov 2022
ViT-CX: Causal Explanation of Vision Transformers Weiyan Xie Xiao-hui Li Caleb Chen Cao Nevin L.Zhang ViT 37 17 0 06 Nov 2022
Salience Allocation as Guidance for Abstractive Summarization Fei Wang Kaiqiang Song Hongming Zhang Lifeng Jin Sangwoo Cho Wenlin Yao Xiaoyang Wang Muhao Chen Dong Yu 60 32 0 22 Oct 2022
StyLEx: Explaining Style Using Human Lexical Annotations Shirley Anugrah Hayati Kyumin Park Dheeraj Rajagopal Lyle Ungar Dongyeop Kang 28 3 0 14 Oct 2022
On the Explainability of Natural Language Processing Deep Models Julia El Zini M. Awad 39 82 0 13 Oct 2022
Explanations, Fairness, and Appropriate Reliance in Human-AI Decision-Making Jakob Schoeffer Maria De-Arteaga Niklas Kuehl FaML 55 46 0 23 Sep 2022
How to Dissect a Muppet: The Structure of Transformer Embedding Spaces Timothee Mickus Denis Paperno Mathieu Constant 36 20 0 07 Jun 2022
Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps Oren Barkan Edan Hauon Avi Caciularu Ori Katz Itzik Malkiel Omri Armstrong Noam Koenigstein 39 37 0 23 Apr 2022
ProtoTEx: Explaining Model Decisions with Prototype Tensors Anubrata Das Chitrank Gupta Venelin Kovatchev Matthew Lease Junjie Li 39 27 0 11 Apr 2022
Interpretation of Black Box NLP Models: A Survey Shivani Choudhary N. Chatterjee S. K. Saha FAtt 34 10 0 31 Mar 2022
Measuring the Mixing of Contextual Information in the Transformer Javier Ferrando Gerard I. Gállego Marta R. Costa-jussá 36 50 0 08 Mar 2022
POTATO: exPlainable infOrmation exTrAcTion framewOrk Adam Kovacs Kinga Gémes Eszter Iklódi Gábor Recski 38 4 0 31 Jan 2022
Counterfactual Explanations for Models of Code Jürgen Cito Işıl Dillig V. Murali S. Chandra AAML LRM 32 48 0 10 Nov 2021
Revisiting Methods for Finding Influential Examples Karthikeyan K Anders Søgaard TDI 22 30 0 08 Nov 2021
Understanding Interlocking Dynamics of Cooperative Rationalization Mo Yu Yang Zhang Shiyu Chang Tommi Jaakkola 29 41 0 26 Oct 2021
Interpreting Deep Learning Models in Natural Language Processing: A Review Xiaofei Sun Diyi Yang Xiaoya Li Tianwei Zhang Yuxian Meng Han Qiu Guoyin Wang Eduard H. Hovy Jiwei Li 24 45 0 20 Oct 2021
Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining Andreas Madsen Nicholas Meade Vaibhav Adlakha Siva Reddy 111 35 0 15 Oct 2021
Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models Tianlu Wang Rohit Sridhar Diyi Yang Xuezhi Wang AAML 120 72 0 14 Oct 2021
Automated and Explainable Ontology Extension Based on Deep Learning: A Case Study in the Chemical Domain A. Memariani Martin Glauer Fabian Neuhaus Till Mossakowski Janna Hastings 36 5 0 19 Sep 2021
Diagnostics-Guided Explanation Generation Pepa Atanasova J. Simonsen Christina Lioma Isabelle Augenstein LRM FAtt 43 6 0 08 Sep 2021
Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience G. Chrysostomou Nikolaos Aletras 37 16 0 31 Aug 2021
DuTrust: A Sentiment Analysis Dataset for Trustworthiness Evaluation Lijie Wang Hao Liu Shu-ping Peng Hongxuan Tang Xinyan Xiao Ying-Cong Chen Hua Wu Haifeng Wang 25 5 0 30 Aug 2021
A Survey on Automated Fact-Checking Zhijiang Guo M. Schlichtkrull Andreas Vlachos 31 460 0 26 Aug 2021
Zorro: Valid, Sparse, and Stable Explanations in Graph Neural Networks Thorben Funke Megha Khosla Mandeep Rathee Avishek Anand FAtt 25 38 0 18 May 2021
Collaborative Graph Learning with Auxiliary Text for Temporal Event Prediction in Healthcare Chang Lu Chandan K. Reddy Prithwish Chakraborty Samantha Kleinberg Yue Ning 26 58 0 16 May 2021
How Reliable are Model Diagnostics? V. Aribandi Yi Tay Donald Metzler 19 19 0 12 May 2021
Rationalization through Concepts Diego Antognini Boi Faltings FAtt 27 19 0 11 May 2021
Which transformer architecture fits my data? A vocabulary bottleneck in self-attention Noam Wies Yoav Levine Daniel Jannai Amnon Shashua 40 20 0 09 May 2021
Improving the Faithfulness of Attention-based Explanations with Task-specific Information for Text Classification G. Chrysostomou Nikolaos Aletras 32 37 0 06 May 2021
Do Feature Attribution Methods Correctly Attribute Features? Yilun Zhou Serena Booth Marco Tulio Ribeiro J. Shah FAtt XAI 38 132 0 27 Apr 2021
MeSIN: Multilevel Selective and Interactive Network for Medication Recommendation Yang An Liang Zhang Mao You Xueqing Tian Bo Jin Xiaopeng Wei 6 18 0 22 Apr 2021
Making Attention Mechanisms More Robust and Interpretable with Virtual Adversarial Training Shunsuke Kitada Hitoshi Iyatomi AAML 30 8 0 18 Apr 2021
Of Non-Linearity and Commutativity in BERT Sumu Zhao Damian Pascual Gino Brunner Roger Wattenhofer 36 16 0 12 Jan 2021
Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models Tongshuang Wu Marco Tulio Ribeiro Jeffrey Heer Daniel S. Weld 48 244 0 01 Jan 2021
Deep-HOSeq: Deep Higher Order Sequence Fusion for Multimodal Sentiment Analysis Sunny Verma Jiwei Wang Zhefeng Ge Rujia Shen Fan Jin Yang Wang Fang Chen Wei Liu 29 20 0 16 Oct 2020
The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? Jasmijn Bastings Katja Filippova XAI LRM 64 175 0 12 Oct 2020
BERTology Meets Biology: Interpreting Attention in Protein Language Models Jesse Vig Ali Madani Lav Varshney Caiming Xiong R. Socher Nazneen Rajani 34 289 0 26 Jun 2020
The Depth-to-Width Interplay in Self-Attention Yoav Levine Noam Wies Or Sharir Hofit Bata Amnon Shashua 30 45 0 22 Jun 2020
Staying True to Your Word: (How) Can Attention Become Explanation? Martin Tutek Jan Snajder 16 27 0 19 May 2020
Quantifying Attention Flow in Transformers Samira Abnar Willem H. Zuidema 65 779 0 02 May 2020
Sequential Interpretability: Methods, Applications, and Future Direction for Understanding Deep Learning Models in the Context of Sequential Data B. Shickel Parisa Rashidi AI4TS 33 17 0 27 Apr 2020
Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness? Alon Jacovi Yoav Goldberg XAI 48 571 0 07 Apr 2020
On Identifiability in Transformers Gino Brunner Yang Liu Damian Pascual Oliver Richter Massimiliano Ciaramita Roger Wattenhofer ViT 30 187 0 12 Aug 2019