Post-hoc Interpretability for Neural NLP: A Survey

10 August 2021

Siva Reddy

Papers citing "Post-hoc Interpretability for Neural NLP: A Survey"

47 / 47 papers shown

Title
Display Content, Display Methods and Evaluation Methods of the HCI in Explainable Recommender Systems: A Survey Weiqing Li Yue Xu Yuefeng Li Yinghui Huang 23 0 0 14 May 2025
Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification Leon Eshuijs Shihan Wang Antske Fokkens 26 0 0 09 May 2025
Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods Mahdi Dhaini Ege Erdogan Nils Feldhus Gjergji Kasneci 49 0 0 02 May 2025
Superscopes: Amplifying Internal Feature Representations for Language Model Interpretation Jonathan Jacobi Gal Niv LRM ReLM 60 0 0 03 Mar 2025
Concept Layers: Enhancing Interpretability and Intervenability via LLM Conceptualization Or Raphael Bidusa Shaul Markovitch 61 0 0 20 Feb 2025
FitCF: A Framework for Automatic Feature Importance-guided Counterfactual Example Generation Qianli Wang Nils Feldhus Simon Ostermann Luis Felipe Villa-Arenas Sebastian Möller Vera Schmitt AAML 34 0 0 01 Jan 2025
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation Dennis Fucci Marco Gaido Beatrice Savoldi Matteo Negri Mauro Cettolo L. Bentivogli 54 1 0 03 Nov 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs Nitay Calderon Roi Reichart 40 10 0 27 Jul 2024
Latent Concept-based Explanation of NLP Models Xuemin Yu Fahim Dalvi Nadir Durrani Marzia Nouri Hassan Sajjad LRM FAtt 29 1 0 18 Apr 2024
Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending Mario Sanz-Guerrero Javier Arroyo 28 4 0 29 Jan 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models Asma Ghandeharioun Avi Caciularu Adam Pearce Lucas Dixon Mor Geva 34 87 0 11 Jan 2024
HCDIR: End-to-end Hate Context Detection, and Intensity Reduction model for online comments Neeraj Kumar Singh Koyel Ghosh Joy Mahapatra Utpal Garain Apurbalal Senapati 11 0 0 20 Dec 2023
Interpreting Pretrained Language Models via Concept Bottlenecks Zhen Tan Lu Cheng Song Wang Yuan Bo Wenlin Yao Huan Liu LRM 32 20 0 08 Nov 2023
Codebook Features: Sparse and Discrete Interpretability for Neural Networks Alex Tamkin Mohammad Taufeeque Noah D. Goodman 32 27 0 26 Oct 2023
InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations Nils Feldhus Qianli Wang Tatiana Anikina Sahil Chopra Cennet Oguz Sebastian Möller 32 9 0 09 Oct 2023
AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap Q. V. Liao J. Vaughan 38 158 0 02 Jun 2023
Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme Recognition Xiao-lan Wu P. Bell A. Rajan 19 5 0 29 May 2023
Computational modeling of semantic change Nina Tahmasebi Haim Dubossarsky 34 6 0 13 Apr 2023
Multi-resolution Interpretation and Diagnostics Tool for Natural Language Classifiers P. Jalali Nengfeng Zhou Yufei Yu AAML 33 0 0 06 Mar 2023
IFAN: An Explainability-Focused Interaction Framework for Humans and NLP Models Edoardo Mosca Daryna Dementieva Tohid Ebrahim Ajdari Maximilian Kummeth Kirill Gringauz Yutong Zhou Georg Groh 24 8 0 06 Mar 2023
Explanations for Automatic Speech Recognition Xiao-lan Wu P. Bell A. Rajan 6 6 0 27 Feb 2023
A Scalable Space-efficient In-database Interpretability Framework for Embedding-based Semantic SQL Queries P. Kudva R. Bordawekar Apoorva Nitsure 17 0 0 23 Feb 2023
Understanding the Role of Human Intuition on Reliance in Human-AI Decision-Making with Explanations Valerie Chen Q. V. Liao Jennifer Wortman Vaughan Gagan Bansal 41 104 0 18 Jan 2023
Universal and Independent: Multilingual Probing Framework for Exhaustive Model Interpretation and Evaluation O. Serikov Vitaly Protasov E. Voloshina V. Knyazkova Tatiana Shavrina 35 3 0 24 Oct 2022
Explainable Causal Analysis of Mental Health on Social Media Data Chandni Saxena Muskan Garg G. Saxena CML 29 8 0 16 Oct 2022
Review of Natural Language Processing in Pharmacology D. Trajanov Vangel Trajkovski Makedonka Dimitrieva Jovana Dobreva Milos Jovanovik Matej Klemen Alevs vZagar Marko Robnik-vSikonja LM&MA 23 7 0 22 Aug 2022
ferret: a Framework for Benchmarking Explainers on Transformers Giuseppe Attanasio Eliana Pastor C. Bonaventura Debora Nozza 33 30 0 02 Aug 2022
Is Attention Interpretation? A Quantitative Assessment On Sets Jonathan Haab N. Deutschmann María Rodríguez Martínez 19 7 0 26 Jul 2022
Mediators: Conversational Agents Explaining NLP Model Behavior Nils Feldhus A. Ravichandran Sebastian Möller 32 16 0 13 Jun 2022
Interactive Model Cards: A Human-Centered Approach to Model Documentation Anamaria Crisan Margaret Drouhard Jesse Vig Nazneen Rajani HAI 30 87 0 05 May 2022
Interpretation of Black Box NLP Models: A Survey Shivani Choudhary N. Chatterjee S. K. Saha FAtt 34 10 0 31 Mar 2022
Measuring the Mixing of Contextual Information in the Transformer Javier Ferrando Gerard I. Gállego Marta R. Costa-jussá 23 49 0 08 Mar 2022
Interpreting Language Models with Contrastive Explanations Kayo Yin Graham Neubig MILM 21 77 0 21 Feb 2022
"Will You Find These Shortcuts?" A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification Jasmijn Bastings Sebastian Ebert Polina Zablotskaia Anders Sandholm Katja Filippova 115 75 0 14 Nov 2021
Explainable AI (XAI): A Systematic Meta-Survey of Current Challenges and Future Opportunities Waddah Saeed C. Omlin XAI 36 415 0 11 Nov 2021
Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining Andreas Madsen Nicholas Meade Vaibhav Adlakha Siva Reddy 103 35 0 15 Oct 2021
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 226 405 0 24 Feb 2021
UnNatural Language Inference Koustuv Sinha Prasanna Parthasarathi Joelle Pineau Adina Williams 216 94 0 30 Dec 2020
It's Morphin' Time! Combating Linguistic Discrimination with Inflectional Perturbations Samson Tan Shafiq R. Joty Min-Yen Kan R. Socher 166 103 0 09 May 2020
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 246 4,489 0 23 Jan 2020
A Survey on Bias and Fairness in Machine Learning Ninareh Mehrabi Fred Morstatter N. Saxena Kristina Lerman Aram Galstyan SyDa FaML 323 4,212 0 23 Aug 2019
e-SNLI: Natural Language Inference with Natural Language Explanations Oana-Maria Camburu Tim Rocktaschel Thomas Lukasiewicz Phil Blunsom LRM 257 620 0 04 Dec 2018
What you can cram into a single vector: Probing sentence embeddings for linguistic properties Alexis Conneau Germán Kruszewski Guillaume Lample Loïc Barrault Marco Baroni 201 882 0 03 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,959 0 20 Apr 2018
A causal framework for explaining the predictions of black-box sequence-to-sequence models David Alvarez-Melis Tommi Jaakkola CML 232 201 0 06 Jul 2017
Towards A Rigorous Science of Interpretable Machine Learning Finale Doshi-Velez Been Kim XAI FaML 254 3,684 0 28 Feb 2017
Efficient Estimation of Word Representations in Vector Space Tomáš Mikolov Kai Chen G. Corrado J. Dean 3DV 245 31,257 0 16 Jan 2013