Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?

7 April 2020

Papers citing "Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?"

50 / 381 papers shown

Title
Fixed Point Explainability Emanuele La Malfa Jon Vadillo Marco Molinari Michael Wooldridge 12 0 0 18 May 2025
LAMP: Extracting Locally Linear Decision Surfaces from LLM World Models Ryan Chen Youngmin Ko Zeyu Zhang Catherine Cho Sunny Chung Mauro Giuffré Dennis L. Shung Bradly C. Stadie 11 0 0 17 May 2025
Unveiling Knowledge Utilization Mechanisms in LLM-based Retrieval-Augmented Generation Yuhao Wang Ruiyang Ren Yucheng Wang Wayne Xin Zhao Jing Liu Hua Wu Haifeng Wang 12 0 0 17 May 2025
DSADF: Thinking Fast and Slow for Decision Making Alex Zhihao Dou Dongfei Cui Jun Yan Wei Wang Benteng Chen Haoming Wang Zeke Xie Shufei Zhang OffRL 43 1 0 13 May 2025
From Pixels to Perception: Interpretable Predictions via Instance-wise Grouped Feature Selection Moritz Vandenhirtz Julia E. Vogt 40 0 0 09 May 2025
Reasoning Models Don't Always Say What They Think Yanda Chen Joe Benton Ansh Radhakrishnan Jonathan Uesato Carson E. Denison ... Vlad Mikulik Samuel R. Bowman Jan Leike Jared Kaplan E. Perez ReLM LRM 68 15 1 08 May 2025
Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets Wei Liu Zhongyu Niu Lang Gao Zhiying Deng Jun Wang Haozhao Wang Ruixuan Li 209 1 0 04 May 2025
PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications Trisanth Srinivasan Santosh Patapati 39 0 0 03 May 2025
Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods Mahdi Dhaini Ege Erdogan Nils Feldhus Gjergji Kasneci 51 0 0 02 May 2025
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations Katie Matton Robert Osazuwa Ness John Guttag Emre Kıcıman 29 2 0 19 Apr 2025
A constraints-based approach to fully interpretable neural networks for detecting learner behaviors Juan D. Pinto Luc Paquette 48 0 0 10 Apr 2025
A Meaningful Perturbation Metric for Evaluating Explainability Methods Danielle Cohen Hila Chefer Lior Wolf AAML 25 0 0 09 Apr 2025
LExT: Towards Evaluating Trustworthiness of Natural Language Explanations Krithi Shailya Shreya Rajpal Gokul S Krishnan Balaraman Ravindran ELM 59 1 0 08 Apr 2025
Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations Pedro Ferreira Wilker Aziz Ivan Titov LRM 31 0 0 07 Apr 2025
CFIRE: A General Method for Combining Local Explanations Sebastian Müller Vanessa Toborek Tamás Horváth Christian Bauckhage FAtt 53 0 0 01 Apr 2025
LLMs for Explainable AI: A Comprehensive Survey Ahsan Bilal David Ebert Beiyu Lin 72 1 0 31 Mar 2025
On Explaining (Large) Language Models For Code Using Global Code-Based Explanations David Nader-Palacio Dipin Khati Daniel Rodríguez-Cárdenas Alejandro Velasco Denys Poshyvanyk LRM 47 0 0 21 Mar 2025
Faithfulness of LLM Self-Explanations for Commonsense Tasks: Larger Is Better, and Instruction-Tuning Allows Trade-Offs but Not Pareto Dominance Noah Y. Siegel N. Heess Maria Perez-Ortiz Oana-Maria Camburu LRM 54 0 0 17 Mar 2025
Reasoning-Grounded Natural Language Explanations for Language Models Vojtech Cahlik Rodrigo Alves Pavel Kordík LRM 59 1 0 14 Mar 2025
Combining Causal Models for More Accurate Abstractions of Neural Networks Theodora-Mara Pîslar Sara Magliacane Atticus Geiger AI4CE 52 0 0 14 Mar 2025
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation Bowen Baker Joost Huizinga Leo Gao Zehao Dou M. Guan Aleksander Mądry Wojciech Zaremba J. Pachocki David Farhi LRM 77 13 0 14 Mar 2025
Cross-Examiner: Evaluating Consistency of Large Language Model-Generated Explanations Danielle Villa Maria Chang K. Murugesan Rosario A. Uceda-Sosa Karthikeyan N. Ramamurthy LRM 55 0 0 11 Mar 2025
A Unified Framework with Novel Metrics for Evaluating the Effectiveness of XAI Techniques in LLMs Melkamu Mersha Mesay Gemeda Yigezu Hassan Shakil Ali Al shami SangHyun Byun Jugal Kalita 62 0 0 06 Mar 2025
A Causal Lens for Evaluating Faithfulness Metrics Kerem Zaman Shashank Srivastava 73 0 0 26 Feb 2025
Can LLMs Explain Themselves Counterfactually? Zahra Dehghanighobadi Asja Fischer Muhammad Bilal Zafar LRM 47 0 0 25 Feb 2025
Comparing zero-shot self-explanations with human rationales in text classification Stephanie Brandl Oliver Eberle 65 0 0 24 Feb 2025
Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking Yi-Ling Chung Aurora Cobo Pablo Serna SyDa HILM 68 0 0 24 Feb 2025
A Survey of Model Architectures in Information Retrieval Zhichao Xu Fengran Mo Zhiqi Huang Crystina Zhang Puxuan Yu Bei Wang Jimmy J. Lin Vivek Srikumar KELM 3DV 73 2 0 21 Feb 2025
A Close Look at Decomposition-based XAI-Methods for Transformer Language Models L. Arras Bruno Puri Patrick Kahardipraja Sebastian Lapuschkin Wojciech Samek 46 1 0 21 Feb 2025
Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking Greta Warren Irina Shklovski Isabelle Augenstein OffRL 80 4 0 13 Feb 2025
Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies Sunnie S. Y. Kim J. Vaughan Q. V. Liao Tania Lombrozo Olga Russakovsky 109 5 0 12 Feb 2025
Is Conversational XAI All You Need? Human-AI Decision Making With a Conversational XAI Assistant Gaole He Nilay Aishwarya U. Gadiraju 46 6 0 29 Jan 2025
A Study of the Plausibility of Attention between RNN Encoders in Natural Language Inference Duc Hau Nguyen Duc Hau Nguyen Pascale Sébillot 60 5 0 23 Jan 2025
Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based Explanation Duc Hau Nguyen Cyrielle Mallart Guillaume Gravier Pascale Sébillot 65 0 0 22 Jan 2025
ConSim: Measuring Concept-Based Explanations' Effectiveness with Automated Simulatability Antonin Poché Alon Jacovi Agustin Picard Victor Boutin Fanny Jourdan 42 2 0 10 Jan 2025
Codebook LLMs: Evaluating LLMs as Measurement Tools for Political Science Concepts Andrew Halterman Katherine A. Keith 52 2 0 10 Jan 2025
Explainable Time Series Prediction of Tyre Energy in Formula One Race Strategy Jamie Todd Junqi Jiang Aaron Russo Steffen Winkler Stuart Sale Joseph McMillan Antonio Rago AI4TS 38 0 0 07 Jan 2025
Boosting Explainability through Selective Rationalization in Pre-trained Language Models Libing Yuan Shuaibo Hu Kui Yu Le Wu LRM 33 0 0 03 Jan 2025
Reconciling Privacy and Explainability in High-Stakes: A Systematic Inquiry Supriya Manna Niladri Sett 171 0 0 30 Dec 2024
Can Highlighting Help GitHub Maintainers Track Security Fixes? Xueqing Liu Yuchen Xiong Qiushi Liu Jiangrui Zheng 72 0 0 18 Nov 2024
Explanations that reveal all through the definition of encoding A. Puli Nhi Nguyen Rajesh Ranganath FAtt XAI 43 1 0 04 Nov 2024
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation Dennis Fucci Marco Gaido Beatrice Savoldi Matteo Negri Mauro Cettolo L. Bentivogli 57 1 0 03 Nov 2024
Causal Abstraction in Model Interpretability: A Compact Survey Yihao Zhang 38 0 0 26 Oct 2024
Evaluating the Influences of Explanation Style on Human-AI Reliance Emma Casolin Flora D. Salim Ben Newell 23 1 0 26 Oct 2024
On Explaining with Attention Matrices Omar Naim Nicholas Asher 34 1 0 24 Oct 2024
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination Jerry Huang Prasanna Parthasarathi Mehdi Rezagholizadeh Boxing Chen Sarath Chandar 53 0 0 22 Oct 2024
XForecast: Evaluating Natural Language Explanations for Time Series Forecasting Taha Aksu Chenghao Liu Amrita Saha Sarah Tan Caiming Xiong Doyen Sahoo AI4TS 31 1 0 18 Oct 2024
Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models Wei Jie Yeo Ranjan Satapathy Erik Cambria 34 0 0 18 Oct 2024
Hypothesis Testing the Circuit Hypothesis in LLMs Claudia Shi Nicolas Beltran-Velez Achille Nazaret Carolina Zheng Adrià Garriga-Alonso Andrew Jesson Maggie Makar David M. Blei 45 7 0 16 Oct 2024
Fool Me Once? Contrasting Textual and Visual Explanations in a Clinical Decision-Support Setting Maxime Kayser Bayar I. Menzat Cornelius Emde Bogdan Bercean Alex Novak Abdala Espinosa B. Papież Susanne Gaube Thomas Lukasiewicz Oana-Maria Camburu 32 1 0 16 Oct 2024