v1v2 (latest)

Interpreting Language Models with Contrastive Explanations

21 February 2022

Kayo Yin

Graham Neubig

MILM

ArXiv (abs)PDF HTML

Papers citing "Interpreting Language Models with Contrastive Explanations"

30 / 30 papers shown

Title
On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation Jirui Qi Raquel Fernández Arianna Bisazza RALM 119 0 0 01 Apr 2025
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models Sepehr Kamahi Yadollah Yaghoobzadeh 112 0 0 21 Aug 2024
CELL your Model: Contrastive Explanations for Large Language Models Ronny Luss Erik Miehling Amit Dhurandhar 121 0 0 17 Jun 2024
MambaLRP: Explaining Selective State Space Sequence Models F. Jafari G. Montavon Klaus-Robert Müller Oliver Eberle Mamba 259 11 0 11 Jun 2024
LLM-based NLG Evaluation: Current Status and Challenges Mingqi Gao Xinyu Hu Jie Ruan Xiao Pu Xiaojun Wan ELM LM&MA 189 40 0 02 Feb 2024
ALMANACS: A Simulatability Benchmark for Language Model Explainability Edmund Mills Shiye Su Stuart J. Russell Scott Emmons 142 9 0 20 Dec 2023
When Does Translation Require Context? A Data-driven, Multilingual Exploration Patrick Fernandes Kayo Yin Emmy Liu André F. T. Martins Graham Neubig 33 36 0 15 Sep 2021
Post-hoc Interpretability for Neural NLP: A Survey Andreas Madsen Siva Reddy A. Chandar XAI 76 232 0 10 Aug 2021
Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models Matthew Finlayson Aaron Mueller Sebastian Gehrmann Stuart M. Shieber Tal Linzen Yonatan Belinkov 116 110 0 10 Jun 2021
Do Context-Aware Translation Models Pay the Right Attention? Kayo Yin Patrick Fernandes Danish Pruthi Aditi Chaudhary André F. T. Martins Graham Neubig 56 34 0 14 May 2021
Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction Shauli Ravfogel Grusha Prasad Tal Linzen Yoav Goldberg 75 59 0 14 May 2021
Contrastive Explanations for Model Interpretability Alon Jacovi Swabha Swayamdipta Shauli Ravfogel Yanai Elazar Yejin Choi Yoav Goldberg 130 98 0 02 Mar 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 475 2,121 0 31 Dec 2020
Evaluating Explanations: How much do explanations from the teacher aid students? Danish Pruthi Rachit Bansal Bhuwan Dhingra Livio Baldini Soares Michael Collins Zachary Chase Lipton Graham Neubig William W. Cohen FAtt XAI 75 109 0 01 Dec 2020
Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? Peter Hase Joey Tianyi Zhou FAtt 77 304 0 04 May 2020
BLiMP: The Benchmark of Linguistic Minimal Pairs for English Alex Warstadt Alicia Parrish Haokun Liu Anhad Mohananey Wei Peng Sheng-Fu Wang Samuel R. Bowman 89 495 0 02 Dec 2019
AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models Eric Wallace Jens Tuyls Junlin Wang Sanjay Subramanian Matt Gardner Sameer Singh MILM 68 138 0 19 Sep 2019
Fine-grained Sentiment Analysis with Faithful Attention Ruiqi Zhong Steven Shao Kathleen McKeown 88 50 0 19 Aug 2019
Analysis Methods in Neural Language Processing: A Survey Yonatan Belinkov James R. Glass 95 558 0 21 Dec 2018
Do Explanations make VQA Models more Predictable to a Human? Arjun Chandrasekaran Viraj Prabhu Deshraj Yadav Prithvijit Chattopadhyay Devi Parikh FAtt 130 97 0 29 Oct 2018
Marian: Fast Neural Machine Translation in C++ Marcin Junczys-Dowmunt Roman Grundkiewicz Tomasz Dwojak Hieu T. Hoang Kenneth Heafield ... Ulrich Germann Alham Fikri Aji Nikolay Bogoychev André F. T. Martins Alexandra Birch 98 718 0 01 Apr 2018
Axiomatic Attribution for Deep Networks Mukund Sundararajan Ankur Taly Qiqi Yan OOD FAtt 193 6,024 0 04 Mar 2017
Towards A Rigorous Science of Interpretable Machine Learning Finale Doshi-Velez Been Kim XAI FaML 410 3,820 0 28 Feb 2017
Understanding Neural Networks through Representation Erasure Jiwei Li Will Monroe Dan Jurafsky AAML MILM 99 567 0 24 Dec 2016
Pointer Sentinel Mixture Models Stephen Merity Caiming Xiong James Bradbury R. Socher RALM 343 2,900 0 26 Sep 2016
Not Just a Black Box: Learning Important Features Through Propagating Activation Differences Avanti Shrikumar Peyton Greenside A. Shcherbina A. Kundaje FAtt 87 791 0 05 May 2016
Visualizing and Understanding Neural Models in NLP Jiwei Li Xinlei Chen Eduard H. Hovy Dan Jurafsky MILM FAtt 81 707 0 02 Jun 2015
Extraction of Salient Sentences from Labelled Documents Misha Denil Alban Demiraj Nando de Freitas 75 137 0 21 Dec 2014
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps Karen Simonyan Andrea Vedaldi Andrew Zisserman FAtt 314 7,317 0 20 Dec 2013
How to Explain Individual Classification Decisions D. Baehrens T. Schroeter Stefan Harmeling M. Kawanabe K. Hansen K. Müller FAtt 140 1,104 0 06 Dec 2009