Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions

14 May 2020

Papers citing "Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions"

36 / 36 papers shown

Title
Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning L. Zhang Lijie Hu Di Wang LRM 95 0 0 17 Feb 2025
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models Sepehr Kamahi Yadollah Yaghoobzadeh 48 0 0 21 Aug 2024
InfFeed: Influence Functions as a Feedback to Improve the Performance of Subjective Tasks Somnath Banerjee Maulindu Sarkar Punyajoy Saha Binny Mathew Animesh Mukherjee TDI 34 0 0 22 Feb 2024
Generalizing Backpropagation for Gradient-Based Interpretability Kevin Du Lucas Torroba Hennigen Niklas Stoehr Alex Warstadt Ryan Cotterell MILM FAtt 24 7 0 06 Jul 2023
How do languages influence each other? Studying cross-lingual data sharing during LM fine-tuning Rochelle Choenni Dan Garrette Ekaterina Shutova 34 15 0 22 May 2023
Unstructured and structured data: Can we have the best of both worlds with large language models? W. Tan 18 1 0 25 Apr 2023
Simfluence: Modeling the Influence of Individual Training Examples by Simulating Training Runs Kelvin Guu Albert Webson Ellie Pavlick Lucas Dixon Ian Tenney Tolga Bolukbasi TDI 68 33 0 14 Mar 2023
IFAN: An Explainability-Focused Interaction Framework for Humans and NLP Models Edoardo Mosca Daryna Dementieva Tohid Ebrahim Ajdari Maximilian Kummeth Kirill Gringauz Yutong Zhou Georg Groh 24 8 0 06 Mar 2023
In-context Example Selection with Influences Nguyen Tai Eric Wong 13 48 0 21 Feb 2023
Contrastive Error Attribution for Finetuned Language Models Faisal Ladhak Esin Durmus Tatsunori Hashimoto HILM 25 9 0 21 Dec 2022
Towards Human-Centred Explainability Benchmarks For Text Classification Viktor Schlegel Erick Mendez Guzman R. Batista-Navarro 18 5 0 10 Nov 2022
Influence Functions for Sequence Tagging Models Sarthak Jain Varun Manjunatha Byron C. Wallace A. Nenkova TDI 30 8 0 25 Oct 2022
This joke is [MASK]: Recognizing Humor and Offense with Prompting Junze Li Mengjie Zhao Yubo Xie Antonis Maronikolakis Pearl Pu Hinrich Schütze AAML 27 1 0 25 Oct 2022
Finding Dataset Shortcuts with Grammar Induction Dan Friedman Alexander Wettig Danqi Chen 25 14 0 20 Oct 2022
Shortcut Learning of Large Language Models in Natural Language Understanding Mengnan Du Fengxiang He Na Zou Dacheng Tao Xia Hu KELM OffRL 31 84 0 25 Aug 2022
Then and Now: Quantifying the Longitudinal Validity of Self-Disclosed Depression Diagnoses Keith Harrigian Mark Dredze 17 3 0 22 Jun 2022
Challenges in Applying Explainability Methods to Improve the Fairness of NLP Models Esma Balkir S. Kiritchenko I. Nejadgholi Kathleen C. Fraser 21 36 0 08 Jun 2022
It Takes Two Flints to Make a Fire: Multitask Learning of Neural Relation and Explanation Classifiers Zheng Tang Mihai Surdeanu 19 6 0 25 Apr 2022
Towards Explainable Evaluation Metrics for Natural Language Generation Christoph Leiter Piyawat Lertvittayakumjorn M. Fomicheva Wei-Ye Zhao Yang Gao Steffen Eger AAML ELM 22 20 0 21 Mar 2022
First is Better Than Last for Language Data Influence Chih-Kuan Yeh Ankur Taly Mukund Sundararajan Frederick Liu Pradeep Ravikumar TDI 25 20 0 24 Feb 2022
Diagnosing AI Explanation Methods with Folk Concepts of Behavior Alon Jacovi Jasmijn Bastings Sebastian Gehrmann Yoav Goldberg Katja Filippova 36 15 0 27 Jan 2022
Scaling Up Influence Functions Andrea Schioppa Polina Zablotskaia David Vilar Artem Sokolov TDI 25 90 0 06 Dec 2021
Explainable Deep Learning in Healthcare: A Methodological Survey from an Attribution View Di Jin Elena Sergeeva W. Weng Geeticka Chauhan Peter Szolovits OOD 31 55 0 05 Dec 2021
Adversarial Attacks on Knowledge Graph Embeddings via Instance Attribution Methods Peru Bhardwaj John D. Kelleher Luca Costabello Declan O’Sullivan 16 20 0 04 Nov 2021
Interpreting Deep Learning Models in Natural Language Processing: A Review Xiaofei Sun Diyi Yang Xiaoya Li Tianwei Zhang Yuxian Meng Han Qiu Guoyin Wang Eduard H. Hovy Jiwei Li 17 44 0 20 Oct 2021
ProoFVer: Natural Logic Theorem Proving for Fact Verification Amrith Krishna Sebastian Riedel Andreas Vlachos 21 61 0 25 Aug 2021
On Sample Based Explanation Methods for NLP:Efficiency, Faithfulness, and Semantic Evaluation Wei Zhang Ziming Huang Yada Zhu Guangnan Ye Xiaodong Cui Fan Zhang 23 17 0 09 Jun 2021
Explanation-Based Human Debugging of NLP Models: A Survey Piyawat Lertvittayakumjorn Francesca Toni LRM 37 79 0 30 Apr 2021
Explaining the Road Not Taken Hua Shen Ting-Hao 'Kenneth' Huang FAtt XAI 25 9 0 27 Mar 2021
Contrastive Explanations for Model Interpretability Alon Jacovi Swabha Swayamdipta Shauli Ravfogel Yanai Elazar Yejin Choi Yoav Goldberg 35 95 0 02 Mar 2021
FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging Han Guo Nazneen Rajani Peter Hase Mohit Bansal Caiming Xiong TDI 33 102 0 31 Dec 2020
Transformer Feed-Forward Layers Are Key-Value Memories Mor Geva R. Schuster Jonathan Berant Omer Levy KELM 22 741 0 29 Dec 2020
Efficient Estimation of Influence of a Training Instance Sosuke Kobayashi Sho Yokoi Jun Suzuki Kentaro Inui TDI 27 15 0 08 Dec 2020
Cross-Loss Influence Functions to Explain Deep Network Representations Andrew Silva Rohit Chopra Matthew C. Gombolay TDI 18 15 0 03 Dec 2020
e-SNLI: Natural Language Inference with Natural Language Explanations Oana-Maria Camburu Tim Rocktaschel Thomas Lukasiewicz Phil Blunsom LRM 255 620 0 04 Dec 2018
Towards A Rigorous Science of Interpretable Machine Learning Finale Doshi-Velez Been Kim XAI FaML 251 3,683 0 28 Feb 2017