Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.14767
Cited By
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
28 April 2023
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Dissecting Recall of Factual Associations in Auto-Regressive Language Models"
50 / 69 papers shown
Title
Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification
Leon Eshuijs
Shihan Wang
Antske Fokkens
31
0
0
09 May 2025
OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
Xiaoyu Xu
Minxin Du
Qingqing Ye
Haibo Hu
MU
57
0
0
07 May 2025
Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models
Tyler A. Chang
Benjamin Bergen
53
0
0
21 Apr 2025
Steering off Course: Reliability Challenges in Steering Language Models
Patrick Queiroz Da Silva
Hari Sethuraman
Dheeraj Rajagopal
Hannaneh Hajishirzi
Sachin Kumar
LLMSV
37
1
0
06 Apr 2025
HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks
Jiuding Sun
Jing Huang
Sidharth Baskaran
Karel DÓosterlinck
Christopher Potts
Michael Sklar
Atticus Geiger
AI4CE
71
1
0
13 Mar 2025
Superscopes: Amplifying Internal Feature Representations for Language Model Interpretation
Jonathan Jacobi
Gal Niv
LRM
ReLM
65
0
0
03 Mar 2025
Model Lakes
Koyena Pal
David Bau
Renée J. Miller
67
0
0
24 Feb 2025
Do Multilingual LLMs Think In English?
Lisa Schut
Y. Gal
Sebastian Farquhar
44
3
0
24 Feb 2025
Revealing and Mitigating Over-Attention in Knowledge Editing
Pinzheng Wang
Zecheng Tang
Keyan Zhou
J. Li
Qiaoming Zhu
Hao Fei
KELM
124
2
0
21 Feb 2025
Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare
Hiba Ahsan
Arnab Sen Sharma
Silvio Amir
David Bau
Byron C. Wallace
88
0
0
20 Feb 2025
Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Shichang Zhang
Tessa Han
Usha Bhalla
Hima Lakkaraju
FAtt
150
0
0
17 Feb 2025
ReLearn: Unlearning via Learning for Large Language Models
Haoming Xu
Ningyuan Zhao
Liming Yang
Sendong Zhao
Shumin Deng
Mengru Wang
Bryan Hooi
Nay Oo
H. Chen
N. Zhang
KELM
CLL
MU
213
0
0
16 Feb 2025
Learning Task Representations from In-Context Learning
Baturay Saglam
Zhuoran Yang
Dionysis Kalogerias
Amin Karbasi
60
1
0
08 Feb 2025
Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference
Go Kamoda
Benjamin Heinzerling
Tatsuro Inaba
Keito Kudo
Keisuke Sakaguchi
Kentaro Inui
MILM
36
1
0
27 Jan 2025
Understanding and Mitigating Gender Bias in LLMs via Interpretable Neuron Editing
Zeping Yu
Sophia Ananiadou
KELM
47
1
0
24 Jan 2025
LLMs as Repositories of Factual Knowledge: Limitations and Solutions
Seyed Mahed Mousavi
Simone Alghisi
Giuseppe Riccardi
KELM
49
0
0
22 Jan 2025
Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models
Yanwen Huang
Yong Zhang
Ning Cheng
Zhitao Li
Shaojun Wang
Jing Xiao
91
0
0
02 Jan 2025
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
Weilong Dong
Xinwei Wu
Renren Jin
Shaoyang Xu
Deyi Xiong
65
7
0
31 Dec 2024
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
Zhangqi Jiang
Junkai Chen
Beier Zhu
Tingjin Luo
Yankun Shen
Xu Yang
108
4
0
23 Nov 2024
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Javier Ferrando
Oscar Obeso
Senthooran Rajamanoharan
Neel Nanda
85
12
0
21 Nov 2024
Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering
Zeping Yu
Sophia Ananiadou
184
0
0
17 Nov 2024
Controllable Context Sensitivity and the Knob Behind It
Julian Minder
Kevin Du
Niklas Stoehr
Giovanni Monea
Chris Wendler
Robert West
Ryan Cotterell
KELM
58
3
0
11 Nov 2024
Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion
Denitsa Saynova
Lovisa Hagström
Moa Johansson
Richard Johansson
Marco Kuhlmann
HILM
43
0
0
18 Oct 2024
The Geometry of Numerical Reasoning: Language Models Compare Numeric Properties in Linear Subspaces
Ahmed Oumar El-Shangiti
Tatsuya Hiraoka
Hilal AlQuabeh
Benjamin Heinzerling
Kentaro Inui
44
1
0
17 Oct 2024
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
Phillip Guo
Aaquib Syed
Abhay Sheshadri
Aidan Ewart
Gintare Karolina Dziugaite
KELM
MU
41
5
0
16 Oct 2024
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
Yein Park
Chanwoong Yoon
Jungwoo Park
Donghyeon Lee
Minbyul Jeong
Jaewoo Kang
KELM
64
1
0
13 Oct 2024
Dissecting Fine-Tuning Unlearning in Large Language Models
Yihuai Hong
Yuelin Zou
Lijie Hu
Ziqian Zeng
Di Wang
Haiqin Yang
AAML
MU
45
2
0
09 Oct 2024
Towards Interpreting Visual Information Processing in Vision-Language Models
Clement Neo
Luke Ong
Philip Torr
Mor Geva
David M. Krueger
Fazl Barez
92
6
0
09 Oct 2024
From Tokens to Words: On the Inner Lexicon of LLMs
Guy Kaplan
Matanel Oren
Yuval Reif
Roy Schwartz
52
12
0
08 Oct 2024
Erasing Conceptual Knowledge from Language Models
Rohit Gandikota
Sheridan Feucht
Samuel Marks
David Bau
KELM
ELM
MU
50
6
0
03 Oct 2024
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
Philipp Mondorf
Sondre Wold
Barbara Plank
39
0
0
02 Oct 2024
Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition
Jiyeon Kim
Hyunji Lee
Hyowon Cho
Joel Jang
Hyeonbin Hwang
Seungpil Won
Youbin Ahn
Dohaeng Lee
Minjoon Seo
KELM
149
3
0
02 Oct 2024
Geometric Signatures of Compositionality Across a Language Model's Lifetime
Jin Hwa Lee
Thomas Jiralerspong
Lei Yu
Yoshua Bengio
Emily Cheng
CoGe
87
0
0
02 Oct 2024
Quantifying reliance on external information over parametric knowledge during Retrieval Augmented Generation (RAG) using mechanistic analysis
Reshmi Ghosh
Rahul Seetharaman
Hitesh Wadhwa
Somyaa Aggarwal
Samyadeep Basu
Soundararajan Srinivasan
Wenlong Zhao
Shreyas Chaudhari
Ehsan Aghazadeh
35
0
0
01 Oct 2024
Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
Zeping Yu
Sophia Ananiadou
LRM
MILM
27
7
0
21 Sep 2024
Personality Alignment of Large Language Models
Minjun Zhu
Linyi Yang
Yue Zhang
Yue Zhang
ALM
67
5
0
21 Aug 2024
A Mechanistic Interpretation of Syllogistic Reasoning in Auto-Regressive Language Models
Geonhee Kim
Marco Valentino
André Freitas
LRM
AI4CE
30
7
0
16 Aug 2024
Knowledge in Superposition: Unveiling the Failures of Lifelong Knowledge Editing for Large Language Models
Chenhui Hu
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
KELM
61
2
0
14 Aug 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
82
19
0
02 Jul 2024
Beyond Individual Facts: Investigating Categorical Knowledge Locality of Taxonomy and Meronomy Concepts in GPT Models
Christopher Burger
Yifan Hu
Thai Le
KELM
49
0
0
22 Jun 2024
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Hoyeon Chang
Jinho Park
Seonghyeon Ye
Sohee Yang
Youngkyung Seo
Du-Seong Chang
Minjoon Seo
KELM
37
33
0
17 Jun 2024
Understanding Information Storage and Transfer in Multi-modal Large Language Models
Samyadeep Basu
Martin Grayson
C. Morrison
Besmira Nushi
S. Feizi
Daniela Massiceti
28
10
0
06 Jun 2024
LoFiT: Localized Fine-tuning on LLM Representations
Fangcong Yin
Xi Ye
Greg Durrett
38
13
0
03 Jun 2024
Knowledge Circuits in Pretrained Transformers
Yunzhi Yao
Ningyu Zhang
Zekun Xi
Meng Wang
Ziwen Xu
Shumin Deng
Huajun Chen
KELM
72
20
0
28 May 2024
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
Emily Cheng
Diego Doimo
Corentin Kervadec
Iuri Macocco
Jade Yu
Alessandro Laio
Marco Baroni
112
11
0
24 May 2024
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
Bernal Jiménez Gutiérrez
Yiheng Shu
Yu Gu
Michihiro Yasunaga
Yu-Chuan Su
RALM
CLL
68
33
0
23 May 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
46
115
0
28 Mar 2024
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
52
81
0
26 Mar 2024
Investigating Continual Pretraining in Large Language Models: Insights and Implications
cCaugatay Yildiz
Nishaanth Kanna Ravichandran
Prishruit Punia
Matthias Bethge
Beyza Ermis
CLL
KELM
LRM
58
25
0
27 Feb 2024
Cracking Factual Knowledge: A Comprehensive Analysis of Degenerate Knowledge Neurons in Large Language Models
Yuheng Chen
Pengfei Cao
Yubo Chen
Yining Wang
Shengping Liu
Kang Liu
Jun Zhao
KELM
37
1
0
21 Feb 2024
1
2
Next