Interpretation of Black Box NLP Models: A Survey

31 March 2022

Papers citing "Interpretation of Black Box NLP Models: A Survey"

34 / 84 papers shown

Title
Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information Mario Giulianelli J. Harding Florian Mohnert Dieuwke Hupkes Willem H. Zuidema 56 191 0 24 Aug 2018
Shedding Light on Black Box Machine Learning Algorithms: Development of an Axiomatic Framework to Assess the Quality of Methods that Explain Individual Predictions Milo Honegger 43 35 0 15 Aug 2018
Textual Explanations for Self-Driving Vehicles Jinkyu Kim Anna Rohrbach Trevor Darrell John F. Canny Zeynep Akata 49 340 0 30 Jul 2018
A Game-Based Approximate Verification of Deep Neural Networks with Provable Guarantees Min Wu Matthew Wicker Wenjie Ruan Xiaowei Huang Marta Kwiatkowska AAML 50 111 0 10 Jul 2018
On the Robustness of Interpretability Methods David Alvarez-Melis Tommi Jaakkola 70 526 0 21 Jun 2018
Did the Model Understand the Question? Pramod Kaushik Mudrakarta Ankur Taly Mukund Sundararajan Kedar Dhamdhere ELM OOD FAtt 51 197 0 14 May 2018
Interpretable Adversarial Perturbation in Input Embedding Space for Text Motoki Sato Jun Suzuki Hiroyuki Shindo Yuji Matsumoto 47 191 0 08 May 2018
What you can cram into a single vector: Probing sentence embeddings for linguistic properties Alexis Conneau Germán Kruszewski Guillaume Lample Loïc Barrault Marco Baroni 319 892 0 03 May 2018
Adversarial Attacks Against Medical Deep Learning Systems S. G. Finlayson Hyung Won Chung I. Kohane Andrew L. Beam SILM AAML OOD MedIm 50 231 0 15 Apr 2018
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence Dong Huk Park Lisa Anne Hendricks Zeynep Akata Anna Rohrbach Bernt Schiele Trevor Darrell Marcus Rohrbach 73 421 0 15 Feb 2018
Deep contextualized word representations Matthew E. Peters Mark Neumann Mohit Iyyer Matt Gardner Christopher Clark Kenton Lee Luke Zettlemoyer NAI 192 11,542 0 15 Feb 2018
Consistent Individualized Feature Attribution for Tree Ensembles Scott M. Lundberg G. Erion Su-In Lee FAtt TDI 59 1,392 0 12 Feb 2018
Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution Judea Pearl CML 61 334 0 11 Jan 2018
HotFlip: White-Box Adversarial Examples for Text Classification J. Ebrahimi Anyi Rao Daniel Lowd Dejing Dou AAML 52 78 0 19 Dec 2017
Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure Dieuwke Hupkes Sara Veldhoen Willem H. Zuidema 69 277 0 28 Nov 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 642 130,942 0 12 Jun 2017
A Unified Approach to Interpreting Model Predictions Scott M. Lundberg Su-In Lee FAtt 975 21,815 0 22 May 2017
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems Wang Ling Dani Yogatama Chris Dyer Phil Blunsom AIMat 76 724 0 11 May 2017
What do Neural Machine Translation Models Learn about Morphology? Yonatan Belinkov Nadir Durrani Fahim Dalvi Hassan Sajjad James R. Glass 98 414 0 11 Apr 2017
Learning Important Features Through Propagating Activation Differences Avanti Shrikumar Peyton Greenside A. Kundaje FAtt 182 3,865 0 10 Apr 2017
Axiomatic Attribution for Deep Networks Mukund Sundararajan Ankur Taly Qiqi Yan OOD FAtt 175 5,968 0 04 Mar 2017
Towards A Rigorous Science of Interpretable Machine Learning Finale Doshi-Velez Been Kim XAI FaML 376 3,776 0 28 Feb 2017
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks Yossi Adi Einat Kermany Yonatan Belinkov Ofer Lavi Yoav Goldberg 59 545 0 15 Aug 2016
Enriching Word Vectors with Subword Information Piotr Bojanowski Edouard Grave Armand Joulin Tomas Mikolov NAI SSL VLM 220 9,957 0 15 Jul 2016
Explaining Predictions of Non-Linear Classifiers in NLP L. Arras F. Horn G. Montavon K. Müller Wojciech Samek FAtt 74 117 0 23 Jun 2016
The Mythos of Model Interpretability Zachary Chase Lipton FaML 166 3,685 0 10 Jun 2016
Adversarial Feature Learning Jiasen Lu Philipp Krahenbuhl Trevor Darrell GAN 107 1,608 0 31 May 2016
Layer-wise Relevance Propagation for Neural Networks with Local Renormalization Layers Alexander Binder G. Montavon Sebastian Lapuschkin K. Müller Wojciech Samek FAtt 72 460 0 04 Apr 2016
"Why Should I Trust You?": Explaining the Predictions of Any Classifier Marco Tulio Ribeiro Sameer Singh Carlos Guestrin FAtt FaML 1.0K 16,931 0 16 Feb 2016
Visualizing and Understanding Neural Models in NLP Jiwei Li Xinlei Chen Eduard H. Hovy Dan Jurafsky MILM FAtt 75 707 0 02 Jun 2015
Striving for Simplicity: The All Convolutional Net Jost Tobias Springenberg Alexey Dosovitskiy Thomas Brox Martin Riedmiller FAtt 232 4,665 0 21 Dec 2014
Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau Kyunghyun Cho Yoshua Bengio AIMat 513 27,263 0 01 Sep 2014
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps Karen Simonyan Andrea Vedaldi Andrew Zisserman FAtt 295 7,279 0 20 Dec 2013
Efficient Estimation of Word Representations in Vector Space Tomas Mikolov Kai Chen G. Corrado J. Dean 3DV 633 31,469 0 16 Jan 2013