v1v2 (latest)

Logic Traps in Evaluating Attribution Scores

12 September 2021

Yuanzhe Zhang

Jun Zhao

Papers citing "Logic Traps in Evaluating Attribution Scores"

40 / 40 papers shown

Title
Normalized AOPC: Fixing Misleading Faithfulness Metrics for Feature Attribution Explainability Joakim Edin Andreas Geert Motzfeldt Casper L. Christensen Tuukka Ruotsalo Lars Maaløe Maria Maistro 122 4 0 15 Aug 2024
Perturbing Inputs for Fragile Interpretations in Deep Natural Language Processing Sanchit Sinha Hanjie Chen Arshdeep Sekhon Yangfeng Ji Yanjun Qi AAML FAtt 60 41 0 11 Aug 2021
Evaluating Saliency Methods for Neural Language Models Shuoyang Ding Philipp Koehn FAtt XAI 47 55 0 12 Apr 2021
Interpretation of NLP models through input marginalization Siwon Kim Jihun Yi Eunji Kim Sungroh Yoon MILM FAtt 78 60 0 27 Oct 2020
Gradient-based Analysis of NLP Models is Manipulable Junlin Wang Jens Tuyls Eric Wallace Sameer Singh AAML FAtt 68 60 0 12 Oct 2020
Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifiers Hanjie Chen Yangfeng Ji AAML VLM 87 66 0 01 Oct 2020
How does this interaction affect me? Interpretable attribution for feature interactions Michael Tsang Sirisha Rambhatla Yan Liu FAtt 68 87 0 19 Jun 2020
Self-Attention Attribution: Interpreting Information Interactions Inside Transformer Y. Hao Li Dong Furu Wei Ke Xu ViT 80 225 0 23 Apr 2020
Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness? Alon Jacovi Yoav Goldberg XAI 131 600 0 07 Apr 2020
Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection Hanjie Chen Guangtao Zheng Yangfeng Ji FAtt 95 95 0 04 Apr 2020
ERASER: A Benchmark to Evaluate Rationalized NLP Models Jay DeYoung Sarthak Jain Nazneen Rajani Eric P. Lehman Caiming Xiong R. Socher Byron C. Wallace 130 638 0 08 Nov 2019
AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models Eric Wallace Jens Tuyls Junlin Wang Sanjay Subramanian Matt Gardner Sameer Singh MILM 68 138 0 19 Sep 2019
Learning to Deceive with Attention-Based Explanations Danish Pruthi Mansi Gupta Bhuwan Dhingra Graham Neubig Zachary Chase Lipton 80 193 0 17 Sep 2019
Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment Di Jin Zhijing Jin Qiufeng Wang Peter Szolovits SILM AAML 185 1,086 0 27 Jul 2019
Interpretable Neural Predictions with Differentiable Binary Variables Jasmijn Bastings Wilker Aziz Ivan Titov 82 214 0 20 May 2019
Attention is not Explanation Sarthak Jain Byron C. Wallace FAtt 148 1,328 0 26 Feb 2019
Analysis Methods in Neural Language Processing: A Survey Yonatan Belinkov James R. Glass 95 558 0 21 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 95,175 0 11 Oct 2018
Learning for Single-Shot Confidence Calibration in Deep Neural Networks through Stochastic Inferences Seonguk Seo Paul Hongsuck Seo Bohyung Han FedML UQCV BDL 125 76 0 28 Sep 2018
L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data Jianbo Chen Le Song Martin J. Wainwright Michael I. Jordan FAtt TDI 115 216 0 08 Aug 2018
On the Robustness of Interpretability Methods David Alvarez-Melis Tommi Jaakkola 84 528 0 21 Jun 2018
RISE: Randomized Input Sampling for Explanation of Black-box Models Vitali Petsiuk Abir Das Kate Saenko FAtt 181 1,176 0 19 Jun 2018
Pathologies of Neural Models Make Interpretations Difficult Shi Feng Eric Wallace Alvin Grissom II Mohit Iyyer Pedro Rodriguez Jordan L. Boyd-Graber AAML FAtt 82 321 0 20 Apr 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 1.1K 7,196 0 20 Apr 2018
Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs W. James Murdoch Peter J. Liu Bin Yu 80 210 0 16 Jan 2018
Mitigating Adversarial Effects Through Randomization Cihang Xie Jianyu Wang Zhishuai Zhang Zhou Ren Alan Yuille AAML 115 1,061 0 06 Nov 2017
The (Un)reliability of saliency methods Pieter-Jan Kindermans Sara Hooker Julius Adebayo Maximilian Alber Kristof T. Schütt Sven Dähne D. Erhan Been Kim FAtt XAI 106 688 0 02 Nov 2017
Interpretation of Neural Networks is Fragile Amirata Ghorbani Abubakar Abid James Zou FAtt AAML 133 870 0 29 Oct 2017
On Calibration of Modern Neural Networks Chuan Guo Geoff Pleiss Yu Sun Kilian Q. Weinberger UQCV 299 5,862 0 14 Jun 2017
A Unified Approach to Interpreting Model Predictions Scott M. Lundberg Su-In Lee FAtt 1.1K 22,018 0 22 May 2017
Ensemble Adversarial Training: Attacks and Defenses Florian Tramèr Alexey Kurakin Nicolas Papernot Ian Goodfellow Dan Boneh Patrick McDaniel AAML 177 2,729 0 19 May 2017
RACE: Large-scale ReAding Comprehension Dataset From Examinations Guokun Lai Qizhe Xie Hanxiao Liu Yiming Yang Eduard H. Hovy ELM 193 1,357 0 15 Apr 2017
Learning Important Features Through Propagating Activation Differences Avanti Shrikumar Peyton Greenside A. Kundaje FAtt 203 3,881 0 10 Apr 2017
Axiomatic Attribution for Deep Networks Mukund Sundararajan Ankur Taly Qiqi Yan OOD FAtt 193 6,018 0 04 Mar 2017
Towards A Rigorous Science of Interpretable Machine Learning Finale Doshi-Velez Been Kim XAI FaML 406 3,813 0 28 Feb 2017
Understanding Neural Networks through Representation Erasure Jiwei Li Will Monroe Dan Jurafsky AAML MILM 97 567 0 24 Dec 2016
Rationalizing Neural Predictions Tao Lei Regina Barzilay Tommi Jaakkola 129 812 0 13 Jun 2016
The Limitations of Deep Learning in Adversarial Settings Nicolas Papernot Patrick McDaniel S. Jha Matt Fredrikson Z. Berkay Celik A. Swami AAML 115 3,967 0 24 Nov 2015
Evaluating the visualization of what a Deep Neural Network has learned Wojciech Samek Alexander Binder G. Montavon Sebastian Lapuschkin K. Müller XAI 139 1,199 0 21 Sep 2015
Explaining and Harnessing Adversarial Examples Ian Goodfellow Jonathon Shlens Christian Szegedy AAML GAN 282 19,121 0 20 Dec 2014