Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1804.07781
Cited By
Pathologies of Neural Models Make Interpretations Difficult
20 April 2018
Shi Feng
Eric Wallace
Alvin Grissom II
Mohit Iyyer
Pedro Rodriguez
Jordan L. Boyd-Graber
AAML
FAtt
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Pathologies of Neural Models Make Interpretations Difficult"
50 / 76 papers shown
Title
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models
Sepehr Kamahi
Yadollah Yaghoobzadeh
55
0
0
21 Aug 2024
CELL your Model: Contrastive Explanations for Large Language Models
Ronny Luss
Erik Miehling
Amit Dhurandhar
47
0
0
17 Jun 2024
How the Advent of Ubiquitous Large Language Models both Stymie and Turbocharge Dynamic Adversarial Question Generation
Yoo Yeon Sung
Ishani Mondal
Jordan L. Boyd-Graber
30
0
0
20 Jan 2024
Navigating the Structured What-If Spaces: Counterfactual Generation via Structured Diffusion
Nishtha Madaan
Srikanta J. Bedathur
DiffM
38
0
0
21 Dec 2023
Explaining high-dimensional text classifiers
Odelia Melamed
Rich Caruana
23
0
0
22 Nov 2023
Toward Stronger Textual Attack Detectors
Pierre Colombo
Marine Picot
Nathan Noiry
Guillaume Staerman
Pablo Piantanida
62
5
0
21 Oct 2023
Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks
Xinyu Zhang
Hanbin Hong
Yuan Hong
Peng Huang
Binghui Wang
Zhongjie Ba
Kui Ren
SILM
44
18
0
31 Jul 2023
Explaining Math Word Problem Solvers
Abby Newcomb
Jugal Kalita
18
1
0
24 Jul 2023
Do not Mask Randomly: Effective Domain-adaptive Pre-training by Masking In-domain Keywords
Shahriar Golchin
Mihai Surdeanu
N. Tavabi
A. Kiapour
21
4
0
14 Jul 2023
Open Set Relation Extraction via Unknown-Aware Training
Jun Zhao
Xin Zhao
Wenyu Zhan
Qi Zhang
Tao Gui
Zhongyu Wei
Yunwen Chen
Xiang Gao
Xuanjing Huang
38
1
0
08 Jun 2023
AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap
Q. V. Liao
J. Vaughan
58
159
0
02 Jun 2023
Don't Retrain, Just Rewrite: Countering Adversarial Perturbations by Rewriting Text
Ashim Gupta
Carter Blum
Temma Choji
Yingjie Fei
Shalin S Shah
Alakananda Vempala
Vivek Srikumar
AAML
32
9
0
25 May 2023
Consistent Multi-Granular Rationale Extraction for Explainable Multi-hop Fact Verification
Jiasheng Si
Yingjie Zhu
Deyu Zhou
AAML
52
3
0
16 May 2023
Think Twice: Measuring the Efficiency of Eliminating Prediction Shortcuts of Question Answering Models
Lukávs Mikula
Michal vStefánik
Marek Petrovivc
Petr Sojka
41
3
0
11 May 2023
Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection
Weijia Xu
Sweta Agrawal
Eleftheria Briakou
Marianna J. Martindale
Marine Carpuat
HILM
27
47
0
18 Jan 2023
Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency Methods
Josip Jukić
Martin Tutek
Jan Snajder
FAtt
29
0
0
15 Nov 2022
AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning
Tao Yang
Jinghao Deng
Xiaojun Quan
Qifan Wang
Shaoliang Nie
32
3
0
12 Oct 2022
U3E: Unsupervised and Erasure-based Evidence Extraction for Machine Reading Comprehension
Suzhe He
Shumin Shi
Chenghao Wu
46
0
0
06 Oct 2022
What Do NLP Researchers Believe? Results of the NLP Community Metasurvey
Julian Michael
Ari Holtzman
Alicia Parrish
Aaron Mueller
Alex Jinpeng Wang
...
Divyam Madaan
Nikita Nangia
Richard Yuanzhe Pang
Jason Phang
Sam Bowman
30
37
0
26 Aug 2022
ferret: a Framework for Benchmarking Explainers on Transformers
Giuseppe Attanasio
Eliana Pastor
C. Bonaventura
Debora Nozza
33
30
0
02 Aug 2022
An Interpretability Evaluation Benchmark for Pre-trained Language Models
Ya-Ming Shen
Lijie Wang
Ying-Cong Chen
Xinyan Xiao
Jing Liu
Hua Wu
37
4
0
28 Jul 2022
Fooling Explanations in Text Classifiers
Adam Ivankay
Ivan Girardi
Chiara Marchiori
P. Frossard
AAML
35
20
0
07 Jun 2022
Dual Decomposition of Convex Optimization Layers for Consistent Attention in Medical Images
Tom Ron
M. Weiler-Sagie
Tamir Hazan
FAtt
MedIm
27
6
0
06 Jun 2022
Optimizing Relevance Maps of Vision Transformers Improves Robustness
Hila Chefer
Idan Schwartz
Lior Wolf
ViT
40
38
0
02 Jun 2022
Learning to Ignore Adversarial Attacks
Yiming Zhang
Yan Zhou
Samuel Carton
Chenhao Tan
57
2
0
23 May 2022
A Fine-grained Interpretability Evaluation Benchmark for Neural NLP
Lijie Wang
Yaozong Shen
Shu-ping Peng
Shuai Zhang
Xinyan Xiao
Hao Liu
Hongxuan Tang
Ying-Cong Chen
Hua Wu
Haifeng Wang
ELM
19
21
0
23 May 2022
The Solvability of Interpretability Evaluation Metrics
Yilun Zhou
J. Shah
76
8
0
18 May 2022
It Takes Two Flints to Make a Fire: Multitask Learning of Neural Relation and Explanation Classifiers
Zheng Tang
Mihai Surdeanu
27
6
0
25 Apr 2022
Learning to Scaffold: Optimizing Model Explanations for Teaching
Patrick Fernandes
Marcos Vinícius Treviso
Danish Pruthi
André F. T. Martins
Graham Neubig
FAtt
30
22
0
22 Apr 2022
XAI for Transformers: Better Explanations through Conservative Propagation
Ameen Ali
Thomas Schnake
Oliver Eberle
G. Montavon
Klaus-Robert Muller
Lior Wolf
FAtt
15
89
0
15 Feb 2022
Diagnosing AI Explanation Methods with Folk Concepts of Behavior
Alon Jacovi
Jasmijn Bastings
Sebastian Gehrmann
Yoav Goldberg
Katja Filippova
36
15
0
27 Jan 2022
Measure and Improve Robustness in NLP Models: A Survey
Xuezhi Wang
Haohan Wang
Diyi Yang
139
130
0
15 Dec 2021
Quantifying and Understanding Adversarial Examples in Discrete Input Spaces
Volodymyr Kuleshov
Evgenii Nikishin
S. Thakoor
Tingfung Lau
Stefano Ermon
AAML
27
1
0
12 Dec 2021
Counterfactual Explanations for Models of Code
Jürgen Cito
Işıl Dillig
V. Murali
S. Chandra
AAML
LRM
32
48
0
10 Nov 2021
Double Trouble: How to not explain a text classifier's decisions using counterfactuals synthesized by masked language models?
Thang M. Pham
Trung H. Bui
Long Mai
Anh Totti Nguyen
21
7
0
22 Oct 2021
Interpreting Deep Learning Models in Natural Language Processing: A Review
Xiaofei Sun
Diyi Yang
Xiaoya Li
Tianwei Zhang
Yuxian Meng
Han Qiu
Guoyin Wang
Eduard H. Hovy
Jiwei Li
19
45
0
20 Oct 2021
The Irrationality of Neural Rationale Models
Yiming Zheng
Serena Booth
J. Shah
Yilun Zhou
35
16
0
14 Oct 2021
CheerBots: Chatbots toward Empathy and Emotionusing Reinforcement Learning
Jiun-hao Jhan
Chao-Peng Liu
Shyh-Kang Jeng
Hung-yi Lee
69
9
0
08 Oct 2021
AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses
Yaman Kumar Singla
Swapnil Parekh
Somesh Singh
Junjie Li
R. Shah
Changyou Chen
AAML
41
14
0
24 Sep 2021
Let the CAT out of the bag: Contrastive Attributed explanations for Text
Saneem A. Chemmengath
A. Azad
Ronny Luss
Amit Dhurandhar
FAtt
34
10
0
16 Sep 2021
Neuron-level Interpretation of Deep NLP Models: A Survey
Hassan Sajjad
Nadir Durrani
Fahim Dalvi
MILM
AI4CE
35
82
0
30 Aug 2021
On the Lack of Robust Interpretability of Neural Text Classifiers
Muhammad Bilal Zafar
Michele Donini
Dylan Slack
Cédric Archambeau
Sanjiv Ranjan Das
K. Kenthapadi
AAML
16
21
0
08 Jun 2021
On the Sensitivity and Stability of Model Interpretations in NLP
Fan Yin
Zhouxing Shi
Cho-Jui Hsieh
Kai-Wei Chang
FAtt
19
33
0
18 Apr 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
Hung Le
Nancy F. Chen
Guosheng Lin
MLLM
28
19
0
16 Apr 2021
Explaining the Road Not Taken
Hua Shen
Ting-Hao 'Kenneth' Huang
FAtt
XAI
27
9
0
27 Mar 2021
Local Interpretations for Explainable Natural Language Processing: A Survey
Siwen Luo
Hamish Ivison
S. Han
Josiah Poon
MILM
43
48
0
20 Mar 2021
FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging
Han Guo
Nazneen Rajani
Peter Hase
Joey Tianyi Zhou
Caiming Xiong
TDI
41
102
0
31 Dec 2020
Self-Explaining Structures Improve NLP Models
Zijun Sun
Chun Fan
Qinghong Han
Xiaofei Sun
Yuxian Meng
Fei Wu
Jiwei Li
MILM
XAI
LRM
FAtt
46
38
0
03 Dec 2020
Interpretation of NLP models through input marginalization
Siwon Kim
Jihun Yi
Eunji Kim
Sungroh Yoon
MILM
FAtt
25
58
0
27 Oct 2020
Counterfactual Variable Control for Robust and Interpretable Question Answering
S. Yu
Yulei Niu
Shuohang Wang
Jing Jiang
Qianru Sun
AAML
OOD
44
9
0
12 Oct 2020
1
2
Next