Explainability for Large Language Models: A Survey

v1v2v3 (latest)

Explainability for Large Language Models: A Survey

2 September 2023

Fan Yang

Ninghao Liu

Shuaiqiang Wang

ArXiv (abs)PDF HTML

Papers citing "Explainability for Large Language Models: A Survey"

19 / 119 papers shown

Title
BERT Rediscovers the Classical NLP Pipeline Ian Tenney Dipanjan Das Ellie Pavlick MILM SSeg 151 1,482 0 15 May 2019
On Attribution of Recurrent Neural Network Predictions via Additive Decomposition Mengnan Du Ninghao Liu Fan Yang Shuiwang Ji Helen Zhou FAtt 60 50 0 27 Mar 2019
Attention is not Explanation Sarthak Jain Byron C. Wallace FAtt 155 1,330 0 26 Feb 2019
What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models Fahim Dalvi Nadir Durrani Hassan Sajjad Yonatan Belinkov A. Bau James R. Glass MILM 72 192 0 21 Dec 2018
An Introductory Survey on Attention Mechanisms in NLP Problems Dichao Hu AIMat 80 247 0 12 Nov 2018
Targeted Syntactic Evaluation of Language Models Rebecca Marvin Tal Linzen 94 417 0 27 Aug 2018
Dissecting Contextual Word Embeddings: Architecture and Representation Matthew E. Peters Mark Neumann Luke Zettlemoyer Wen-tau Yih 111 433 0 27 Aug 2018
Techniques for Interpretable Machine Learning Mengnan Du Ninghao Liu Helen Zhou FaML 95 1,092 0 31 Jul 2018
Deep RNNs Encode Soft Hierarchical Syntax Terra Blevins Omer Levy Luke Zettlemoyer 90 111 0 11 May 2018
Seq2Seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models Hendrik Strobelt Sebastian Gehrmann M. Behrisch Adam Perer Hanspeter Pfister Alexander M. Rush VLM HAI 63 240 0 25 Apr 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 1.1K 7,201 0 20 Apr 2018
Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks Yonatan Belinkov Lluís Màrquez i Villodre Hassan Sajjad Nadir Durrani Fahim Dalvi James R. Glass 77 165 0 23 Jan 2018
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) Been Kim Martin Wattenberg Justin Gilmer Carrie J. Cai James Wexler F. Viégas Rory Sayres FAtt 252 1,848 0 30 Nov 2017
The (Un)reliability of saliency methods Pieter-Jan Kindermans Sara Hooker Julius Adebayo Maximilian Alber Kristof T. Schütt Sven Dähne D. Erhan Been Kim FAtt XAI 109 689 0 02 Nov 2017
A Unified Approach to Interpreting Model Predictions Scott M. Lundberg Su-In Lee FAtt 1.1K 22,135 0 22 May 2017
Understanding Black-box Predictions via Influence Functions Pang Wei Koh Percy Liang TDI 229 2,910 0 14 Mar 2017
Axiomatic Attribution for Deep Networks Mukund Sundararajan Ankur Taly Qiqi Yan OOD FAtt 211 6,027 0 04 Mar 2017
Understanding Neural Networks through Representation Erasure Jiwei Li Will Monroe Dan Jurafsky AAML MILM 107 567 0 24 Dec 2016
"Why Should I Trust You?": Explaining the Predictions of Any Classifier Marco Tulio Ribeiro Sameer Singh Carlos Guestrin FAtt FaML 1.2K 17,092 0 16 Feb 2016