Towards Faithful Model Explanation in NLP: A Survey

22 September 2022

Marianna Apidianaki

Papers citing "Towards Faithful Model Explanation in NLP: A Survey"

46 / 46 papers shown

Title
Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods Mahdi Dhaini Ege Erdogan Nils Feldhus Gjergji Kasneci 49 0 0 02 May 2025
Probabilistic Stability Guarantees for Feature Attributions Helen Jin Anton Xue Weiqiu You Surbhi Goel Eric Wong 27 0 0 18 Apr 2025
Explaining Humour Style Classifications: An XAI Approach to Understanding Computational Humour Analysis Mary Ogbuka Kenneth Foaad Khosmood Abbas Edalat 43 0 0 06 Jan 2025
A Tale of Two Imperatives: Privacy and Explainability Supriya Manna Niladri Sett 100 0 0 30 Dec 2024
Variational Language Concepts for Interpreting Foundation Language Models Hengyi Wang Shiwei Tan Zhiqing Hong Desheng Zhang Hao Wang 34 3 0 04 Oct 2024
F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI Xu Zheng Farhad Shirani Zhuomin Chen Chaohao Lin Wei Cheng Wenbo Guo Dongsheng Luo AAML 38 0 0 03 Oct 2024
Enhancing adversarial robustness in Natural Language Inference using explanations Alexandros Koulakos Maria Lymperaiou Giorgos Filandrianos Giorgos Stamou SILM AAML 43 0 0 11 Sep 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs Nitay Calderon Roi Reichart 40 10 0 27 Jul 2024
Perception of Phonological Assimilation by Neural Speech Recognition Models Charlotte Pouw Marianne de Heer Kloots A. Alishahi Willem H. Zuidema 49 2 0 21 Jun 2024
Latent Concept-based Explanation of NLP Models Xuemin Yu Fahim Dalvi Nadir Durrani Marzia Nouri Hassan Sajjad LRM FAtt 29 1 0 18 Apr 2024
Can Interpretability Layouts Influence Human Perception of Offensive Sentences? Thiago Freitas dos Santos Nardine Osman Marco Schorlemmer 24 0 0 01 Mar 2024
Improving Interpretation Faithfulness for Vision Transformers Lijie Hu Yixin Liu Ninghao Liu Mengdi Huai Lichao Sun Di Wang 41 5 0 29 Nov 2023
Evaluating Explanation Methods for Vision-and-Language Navigation Guanqi Chen Lei Yang Guanhua Chen Jia Pan XAI 23 0 0 10 Oct 2023
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers Anna Langedijk Hosein Mohebbi Gabriele Sarti Willem H. Zuidema Jaap Jumelet 32 10 0 05 Oct 2023
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations Yanda Chen Ruiqi Zhong Narutatsu Ri Chen Zhao He He Jacob Steinhardt Zhou Yu Kathleen McKeown LRM 34 47 0 17 Jul 2023
AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap Q. V. Liao J. Vaughan 38 158 0 02 Jun 2023
Consistent Multi-Granular Rationale Extraction for Explainable Multi-hop Fact Verification Jiasheng Si Yingjie Zhu Deyu Zhou AAML 52 3 0 16 May 2023
Computational modeling of semantic change Nina Tahmasebi Haim Dubossarsky 34 6 0 13 Apr 2023
REV: Information-Theoretic Evaluation of Free-Text Rationales Hanjie Chen Faeze Brahman Xiang Ren Yangfeng Ji Yejin Choi Swabha Swayamdipta 92 23 0 10 Oct 2022
Large Language Models are Zero-Shot Reasoners Takeshi Kojima S. Gu Machel Reid Yutaka Matsuo Yusuke Iwasawa ReLM LRM 328 4,077 0 24 May 2022
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations Jaehun Jung Lianhui Qin Sean Welleck Faeze Brahman Chandra Bhagavatula Ronan Le Bras Yejin Choi ReLM LRM 229 190 0 24 May 2022
The Solvability of Interpretability Evaluation Metrics Yilun Zhou J. Shah 70 8 0 18 May 2022
Naturalistic Causal Probing for Morpho-Syntax Afra Amini Tiago Pimentel Clara Meister Ryan Cotterell MILM 106 18 0 14 May 2022
Do Transformer Models Show Similar Attention Patterns to Task-Specific Human Gaze? Stephanie Brandl Oliver Eberle Jonas Pilot Anders Søgaard 69 33 0 25 Apr 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models Xuezhi Wang Jason W. Wei Dale Schuurmans Quoc Le Ed H. Chi Sharan Narang Aakanksha Chowdhery Denny Zhou ReLM BDL LRM AI4CE 314 3,248 0 21 Mar 2022
Rethinking Attention-Model Explainability through Faithfulness Violation Test Y. Liu Haoliang Li Yangyang Guo Chen Kong Jing Li Shiqi Wang FAtt 121 42 0 28 Jan 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 389 8,495 0 28 Jan 2022
Causal Distillation for Language Models Zhengxuan Wu Atticus Geiger J. Rozner Elisa Kreiss Hanson Lu Thomas F. Icard Christopher Potts Noah D. Goodman 89 25 0 05 Dec 2021
"Will You Find These Shortcuts?" A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification Jasmijn Bastings Sebastian Ebert Polina Zablotskaia Anders Sandholm Katja Filippova 115 75 0 14 Nov 2021
Probing Language Models for Understanding of Temporal Expressions Shivin Thukral Kunal Kukreja Christian Kavouras 88 19 0 03 Oct 2021
BeliefBank: Adding Memory to a Pre-Trained Language Model for a Systematic Notion of Belief Nora Kassner Oyvind Tafjord Hinrich Schütze Peter Clark KELM LRM 245 64 0 29 Sep 2021
Putting Words in BERT's Mouth: Navigating Contextualized Vector Spaces with Pseudowords Taelin Karidi Yichu Zhou Nathan Schneider Omri Abend Vivek Srikumar 86 13 0 23 Sep 2021
Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you? Rochelle Choenni Ekaterina Shutova R. Rooij 88 29 0 21 Sep 2021
Incorporating Residual and Normalization Layers into Analysis of Masked Language Models Goro Kobayashi Tatsuki Kuribayashi Sho Yokoi Kentaro Inui 160 46 0 15 Sep 2021
Let's Play Mono-Poly: BERT Can Reveal Words' Polysemy Level and Partitionability into Senses Aina Garí Soler Marianna Apidianaki MILM 209 68 0 29 Apr 2021
Explaining Answers with Entailment Trees Bhavana Dalvi Peter Alexander Jansen Oyvind Tafjord Zhengnan Xie Hannah Smith Leighanna Pipatanangkura Peter Clark ReLM FAtt LRM 239 184 0 17 Apr 2021
Measuring Association Between Labels and Free-Text Rationales Sarah Wiegreffe Ana Marasović Noah A. Smith 282 170 0 24 Oct 2020
Probing Linguistic Systematicity Emily Goodwin Koustuv Sinha Timothy J. O'Donnell 96 58 0 08 May 2020
On Completeness-aware Concept-Based Explanations in Deep Neural Networks Chih-Kuan Yeh Been Kim Sercan Ö. Arik Chun-Liang Li Tomas Pfister Pradeep Ravikumar FAtt 122 297 0 17 Oct 2019
Language Models as Knowledge Bases? Fabio Petroni Tim Rocktaschel Patrick Lewis A. Bakhtin Yuxiang Wu Alexander H. Miller Sebastian Riedel KELM AI4MH 417 2,588 0 03 Sep 2019
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets Mor Geva Yoav Goldberg Jonathan Berant 242 320 0 21 Aug 2019
e-SNLI: Natural Language Inference with Natural Language Explanations Oana-Maria Camburu Tim Rocktaschel Thomas Lukasiewicz Phil Blunsom LRM 260 620 0 04 Dec 2018
What you can cram into a single vector: Probing sentence embeddings for linguistic properties Alexis Conneau Germán Kruszewski Guillaume Lample Loïc Barrault Marco Baroni 201 882 0 03 May 2018
Hypothesis Only Baselines in Natural Language Inference Adam Poliak Jason Naradowsky Aparajita Haldar Rachel Rudinger Benjamin Van Durme 190 576 0 02 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,959 0 20 Apr 2018
Towards A Rigorous Science of Interpretable Machine Learning Finale Doshi-Velez Been Kim XAI FaML 257 3,684 0 28 Feb 2017