Sequential Integrated Gradients: a simple but effective method for explaining language models

25 May 2023

Papers citing "Sequential Integrated Gradients: a simple but effective method for explaining language models"

6 / 6 papers shown

Title
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety Seongmin Lee Aeree Cho Grace C. Kim ShengYun Peng Mansi Phute Duen Horng Chau LM&MA AI4CE 54 0 0 05 Jun 2025
From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs Stanley Yu Vaidehi Bulusu Oscar Yasunaga Clayton Lau Cole Blondin Sean O'Brien Kevin Zhu Vasu Sharma 49 0 0 27 May 2025
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models Sepehr Kamahi Yadollah Yaghoobzadeh 146 0 0 21 Aug 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs Nitay Calderon Roi Reichart 124 16 0 27 Jul 2024
Unveiling LLM Mechanisms Through Neural ODEs and Control Theory Yukun Zhang Qi Dong 109 0 0 23 Jun 2024
"Why Should I Trust You?": Explaining the Predictions of Any Classifier Marco Tulio Ribeiro Sameer Singh Carlos Guestrin FAtt FaML 1.2K 17,124 0 16 Feb 2016