Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.15853
Cited By
Sequential Integrated Gradients: a simple but effective method for explaining language models
25 May 2023
Joseph Enguehard
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Sequential Integrated Gradients: a simple but effective method for explaining language models"
6 / 6 papers shown
Title
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
Seongmin Lee
Aeree Cho
Grace C. Kim
ShengYun Peng
Mansi Phute
Duen Horng Chau
LM&MA
AI4CE
54
0
0
05 Jun 2025
From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs
Stanley Yu
Vaidehi Bulusu
Oscar Yasunaga
Clayton Lau
Cole Blondin
Sean O'Brien
Kevin Zhu
Vasu Sharma
49
0
0
27 May 2025
Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models
Sepehr Kamahi
Yadollah Yaghoobzadeh
146
0
0
21 Aug 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon
Roi Reichart
124
16
0
27 Jul 2024
Unveiling LLM Mechanisms Through Neural ODEs and Control Theory
Yukun Zhang
Qi Dong
109
0
0
23 Jun 2024
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
Marco Tulio Ribeiro
Sameer Singh
Carlos Guestrin
FAtt
FaML
1.2K
17,124
0
16 Feb 2016
1