Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.06676
Cited By
Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions
14 May 2020
Xiaochuang Han
Byron C. Wallace
Yulia Tsvetkov
MILM
FAtt
AAML
TDI
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions"
36 / 36 papers shown
Title
Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning
L. Zhang
Lijie Hu
Di Wang
LRM
95
0
0
17 Feb 2025
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models
Sepehr Kamahi
Yadollah Yaghoobzadeh
48
0
0
21 Aug 2024
InfFeed: Influence Functions as a Feedback to Improve the Performance of Subjective Tasks
Somnath Banerjee
Maulindu Sarkar
Punyajoy Saha
Binny Mathew
Animesh Mukherjee
TDI
34
0
0
22 Feb 2024
Generalizing Backpropagation for Gradient-Based Interpretability
Kevin Du
Lucas Torroba Hennigen
Niklas Stoehr
Alex Warstadt
Ryan Cotterell
MILM
FAtt
24
7
0
06 Jul 2023
How do languages influence each other? Studying cross-lingual data sharing during LM fine-tuning
Rochelle Choenni
Dan Garrette
Ekaterina Shutova
34
15
0
22 May 2023
Unstructured and structured data: Can we have the best of both worlds with large language models?
W. Tan
18
1
0
25 Apr 2023
Simfluence: Modeling the Influence of Individual Training Examples by Simulating Training Runs
Kelvin Guu
Albert Webson
Ellie Pavlick
Lucas Dixon
Ian Tenney
Tolga Bolukbasi
TDI
68
33
0
14 Mar 2023
IFAN: An Explainability-Focused Interaction Framework for Humans and NLP Models
Edoardo Mosca
Daryna Dementieva
Tohid Ebrahim Ajdari
Maximilian Kummeth
Kirill Gringauz
Yutong Zhou
Georg Groh
24
8
0
06 Mar 2023
In-context Example Selection with Influences
Nguyen Tai
Eric Wong
13
48
0
21 Feb 2023
Contrastive Error Attribution for Finetuned Language Models
Faisal Ladhak
Esin Durmus
Tatsunori Hashimoto
HILM
25
9
0
21 Dec 2022
Towards Human-Centred Explainability Benchmarks For Text Classification
Viktor Schlegel
Erick Mendez Guzman
R. Batista-Navarro
18
5
0
10 Nov 2022
Influence Functions for Sequence Tagging Models
Sarthak Jain
Varun Manjunatha
Byron C. Wallace
A. Nenkova
TDI
30
8
0
25 Oct 2022
This joke is [MASK]: Recognizing Humor and Offense with Prompting
Junze Li
Mengjie Zhao
Yubo Xie
Antonis Maronikolakis
Pearl Pu
Hinrich Schütze
AAML
27
1
0
25 Oct 2022
Finding Dataset Shortcuts with Grammar Induction
Dan Friedman
Alexander Wettig
Danqi Chen
25
14
0
20 Oct 2022
Shortcut Learning of Large Language Models in Natural Language Understanding
Mengnan Du
Fengxiang He
Na Zou
Dacheng Tao
Xia Hu
KELM
OffRL
31
84
0
25 Aug 2022
Then and Now: Quantifying the Longitudinal Validity of Self-Disclosed Depression Diagnoses
Keith Harrigian
Mark Dredze
17
3
0
22 Jun 2022
Challenges in Applying Explainability Methods to Improve the Fairness of NLP Models
Esma Balkir
S. Kiritchenko
I. Nejadgholi
Kathleen C. Fraser
21
36
0
08 Jun 2022
It Takes Two Flints to Make a Fire: Multitask Learning of Neural Relation and Explanation Classifiers
Zheng Tang
Mihai Surdeanu
19
6
0
25 Apr 2022
Towards Explainable Evaluation Metrics for Natural Language Generation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei-Ye Zhao
Yang Gao
Steffen Eger
AAML
ELM
22
20
0
21 Mar 2022
First is Better Than Last for Language Data Influence
Chih-Kuan Yeh
Ankur Taly
Mukund Sundararajan
Frederick Liu
Pradeep Ravikumar
TDI
25
20
0
24 Feb 2022
Diagnosing AI Explanation Methods with Folk Concepts of Behavior
Alon Jacovi
Jasmijn Bastings
Sebastian Gehrmann
Yoav Goldberg
Katja Filippova
36
15
0
27 Jan 2022
Scaling Up Influence Functions
Andrea Schioppa
Polina Zablotskaia
David Vilar
Artem Sokolov
TDI
25
90
0
06 Dec 2021
Explainable Deep Learning in Healthcare: A Methodological Survey from an Attribution View
Di Jin
Elena Sergeeva
W. Weng
Geeticka Chauhan
Peter Szolovits
OOD
31
55
0
05 Dec 2021
Adversarial Attacks on Knowledge Graph Embeddings via Instance Attribution Methods
Peru Bhardwaj
John D. Kelleher
Luca Costabello
Declan O’Sullivan
16
20
0
04 Nov 2021
Interpreting Deep Learning Models in Natural Language Processing: A Review
Xiaofei Sun
Diyi Yang
Xiaoya Li
Tianwei Zhang
Yuxian Meng
Han Qiu
Guoyin Wang
Eduard H. Hovy
Jiwei Li
17
44
0
20 Oct 2021
ProoFVer: Natural Logic Theorem Proving for Fact Verification
Amrith Krishna
Sebastian Riedel
Andreas Vlachos
21
61
0
25 Aug 2021
On Sample Based Explanation Methods for NLP:Efficiency, Faithfulness, and Semantic Evaluation
Wei Zhang
Ziming Huang
Yada Zhu
Guangnan Ye
Xiaodong Cui
Fan Zhang
23
17
0
09 Jun 2021
Explanation-Based Human Debugging of NLP Models: A Survey
Piyawat Lertvittayakumjorn
Francesca Toni
LRM
37
79
0
30 Apr 2021
Explaining the Road Not Taken
Hua Shen
Ting-Hao 'Kenneth' Huang
FAtt
XAI
25
9
0
27 Mar 2021
Contrastive Explanations for Model Interpretability
Alon Jacovi
Swabha Swayamdipta
Shauli Ravfogel
Yanai Elazar
Yejin Choi
Yoav Goldberg
35
95
0
02 Mar 2021
FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging
Han Guo
Nazneen Rajani
Peter Hase
Mohit Bansal
Caiming Xiong
TDI
33
102
0
31 Dec 2020
Transformer Feed-Forward Layers Are Key-Value Memories
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
22
741
0
29 Dec 2020
Efficient Estimation of Influence of a Training Instance
Sosuke Kobayashi
Sho Yokoi
Jun Suzuki
Kentaro Inui
TDI
27
15
0
08 Dec 2020
Cross-Loss Influence Functions to Explain Deep Network Representations
Andrew Silva
Rohit Chopra
Matthew C. Gombolay
TDI
18
15
0
03 Dec 2020
e-SNLI: Natural Language Inference with Natural Language Explanations
Oana-Maria Camburu
Tim Rocktaschel
Thomas Lukasiewicz
Phil Blunsom
LRM
255
620
0
04 Dec 2018
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez
Been Kim
XAI
FaML
251
3,683
0
28 Feb 2017
1