Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.05262
Cited By
v1
v2
v3
v4
v5 (latest)
Locating and Editing Factual Associations in GPT
10 February 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Locating and Editing Factual Associations in GPT"
50 / 1,056 papers shown
Title
GIM: Improved Interpretability for Large Language Models
Joakim Edin
Róbert Csordás
Tuukka Ruotsalo
Zhengxuan Wu
Maria Maistro
Jing-ling Huang
Lars Maaløe
126
0
0
23 May 2025
Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models
Hwiyeong Lee
Uiji Hwang
Hyelim Lim
Taeuk Kim
MU
96
1
0
22 May 2025
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models
Ercong Nie
Helmut Schmid
Hinrich Schutze
86
0
0
22 May 2025
CUB: Benchmarking Context Utilisation Techniques for Language Models
Lovisa Hagström
Youna Kim
Haeun Yu
Sang-goo Lee
Richard Johansson
Hyunsoo Cho
Isabelle Augenstein
70
1
0
22 May 2025
When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction
Yuqing Yang
Robin Jia
KELM
LRM
124
1
0
22 May 2025
The Rise of Parameter Specialization for Knowledge Storage in Large Language Models
Yihuai Hong
Yiran Zhao
Wei Tang
Yang Deng
Yu Rong
Wenxuan Zhang
KELM
47
0
0
22 May 2025
Sparse Activation Editing for Reliable Instruction Following in Narratives
Runcong Zhao
Chengyu Cao
Qinglin Zhu
Xiucheng Lv
Shun Shao
Lin Gui
Ruifeng Xu
Yulan He
62
0
0
22 May 2025
Pre-training Large Memory Language Models with Internal and External Knowledge
Linxi Zhao
Sofian Zalouk
Christian K. Belardi
Justin Lovelace
Jin Peng Zhou
Kilian Q. Weinberger
Yoav Artzi
Jennifer J. Sun
KELM
HILM
112
0
0
21 May 2025
Large Language Models as Computable Approximations to Solomonoff Induction
Jun Wan
Lingrui Mei
69
0
0
21 May 2025
Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts
Michal Golovanevsky
William Rudman
Michael Lepori
Amir Bar
Ritambhara Singh
Carsten Eickhoff
99
0
0
21 May 2025
One-Layer Transformers are Provably Optimal for In-context Reasoning and Distributional Association Learning in Next-Token Prediction Tasks
Quan Nguyen
Thanh Nguyen-Tang
MLT
90
0
0
21 May 2025
Causal Interventions Reveal Shared Structure Across English Filler-Gap Constructions
Sasha Boguraev
Christopher Potts
Kyle Mahowald
26
0
0
21 May 2025
Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models
Zihao Li
Xu Wang
Yuzhe Yang
Ziyu Yao
Haoyi Xiong
Jundong Li
LLMSV
LRM
129
3
0
21 May 2025
LyapLock: Bounded Knowledge Preservation in Sequential Large Language Model Editing
Peng Wang
Biyu Zhou
Xuehai Tang
Jizhong Han
Songlin Hu
KELM
126
0
0
21 May 2025
An Empirical Study of the Anchoring Effect in LLMs: Existence, Mechanism, and Potential Mitigations
Yiming Huang
Biquan Bie
Zuqiu Na
Weilin Ruan
Songxin Lei
Yutao Yue
Xinlei He
85
0
0
21 May 2025
LFTF: Locating First and Then Fine-Tuning for Mitigating Gender Bias in Large Language Models
Zhanyue Qin
Yue Ding
Deyuan Liu
Qingbin Liu
Junxian Cai
Xi Chen
Zhiying Tu
Dianhui Chu
Cuiyun Gao
Dianbo Sui
88
0
0
21 May 2025
Editing Across Languages: A Survey of Multilingual Knowledge Editing
Nadir Durrani
Basel Mousi
Fahim Dalvi
KELM
111
0
0
20 May 2025
Temporal Alignment of Time Sensitive Facts with Activation Engineering
Sanjay Govindan
Maurice Pagnucco
Yang Song
KELM
LLMSV
AI4CE
117
0
0
20 May 2025
Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs
Zhipeng Yang
Junzhuo Li
Siyu Xia
Xuming Hu
AIFin
LRM
120
0
0
20 May 2025
Language Models use Lookbacks to Track Beliefs
Nikhil Prakash
Natalie Shapira
Arnab Sen Sharma
Christoph Riedl
Yonatan Belinkov
Tamar Rott Shaham
David Bau
Atticus Geiger
KELM
82
1
0
20 May 2025
Explaining Neural Networks with Reasons
Levin Hornischer
Hannes Leitgeb
FAtt
AAML
MILM
109
0
0
20 May 2025
Revealing the Deceptiveness of Knowledge Editing: A Mechanistic Analysis of Superficial Editing
Jiakuan Xie
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
KELM
31
0
0
19 May 2025
Bidirectional LMs are Better Knowledge Memorizers? A Benchmark for Real-world Knowledge Injection
Yuwei Zhang
Wenhao Yu
Shangbin Feng
Yifan Zhu
Letian Peng
Jayanth Srinivasa
Gaowen Liu
Jingbo Shang
KELM
80
2
0
18 May 2025
SPIRIT: Patching Speech Language Models against Jailbreak Attacks
Amirbek Djanibekov
Nurdaulet Mukhituly
Kentaro Inui
Hanan Aldarmaki
Nils Lukas
AAML
87
0
0
18 May 2025
Reward Inside the Model: A Lightweight Hidden-State Reward Model for LLM's Best-of-N sampling
Jizhou Guo
Zhaomin Wu
Philip S. Yu
98
0
0
18 May 2025
From n-gram to Attention: How Model Architectures Learn and Propagate Bias in Language Modeling
Mohsinul Kabir
Tasfia Tahsin
Sophia Ananiadou
KELM
AI4CE
68
0
0
18 May 2025
LoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades
Yanan Li
Fanxu Meng
Muhan Zhang
Shiai Zhu
Shangguang Wang
Mengwei Xu
MoMe
85
0
0
17 May 2025
NAMET: Robust Massive Model Editing via Noise-Aware Memory Optimization
Yanbo Dai
Zhenlan Ji
Zongjie Li
Shuai Wang
KELM
64
0
0
17 May 2025
Distilled Circuits: A Mechanistic Study of Internal Restructuring in Knowledge Distillation
Reilly Haskins
Benjamin Adams
78
0
0
16 May 2025
The Way We Prompt: Conceptual Blending, Neural Dynamics, and Prompt-Induced Transitions in LLMs
Makoto Sato
56
0
0
16 May 2025
Rethinking Circuit Completeness in Language Models: AND, OR, and ADDER Gates
Hang Chen
Jiaying Zhu
Xinyu Yang
Wenya Wang
LRM
86
0
0
15 May 2025
Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs
Jingcheng Niu
Xingdi Yuan
Tong Wang
Hamidreza Saghir
Amir H. Abdi
81
0
0
14 May 2025
Are We Paying Attention to Her? Investigating Gender Disambiguation and Attention in Machine Translation
Chiara Manna
Afra Alishahi
Frédéric Blain
Eva Vanmassenhove
85
0
0
13 May 2025
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Tobias Schnabel
Kiran Tomlinson
Adith Swaminathan
Jennifer Neville
LRM
158
2
0
13 May 2025
DeltaEdit: Enhancing Sequential Editing in Large Language Models by Controlling Superimposed Noise
Ding Cao
Yuchen Cai
Rongxi Guo
Xiaoxiao He
Guiquan Liu
KELM
170
0
0
12 May 2025
Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification
Leon Eshuijs
Shihan Wang
Antske Fokkens
148
0
0
09 May 2025
Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization
Yuntai Bao
Xuhong Zhang
Tianyu Du
Xinkui Zhao
Jiang Zong
Hao Peng
Yuxiang Cai
TDI
134
0
0
08 May 2025
Understanding In-context Learning of Addition via Activation Subspaces
Xinyan Hu
Kayo Yin
Michael I. Jordan
Jacob Steinhardt
Lijie Chen
151
2
0
08 May 2025
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs
Chetan Pathade
AAML
SILM
230
2
0
07 May 2025
OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
Xiaoyu Xu
Minxin Du
Qingqing Ye
Haibo Hu
MU
167
1
0
07 May 2025
Causal Intervention Framework for Variational Auto Encoder Mechanistic Interpretability
Dip Roy
CML
36
0
0
06 May 2025
Interpreting Multilingual and Document-Length Sensitive Relevance Computations in Neural Retrieval Models through Axiomatic Causal Interventions
Oliver Savolainen
Dur e Najaf Amjad
Roxana Petcu
AAML
73
0
0
04 May 2025
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
Yiming Du
Wenyu Huang
Danna Zheng
Zhaowei Wang
Sébastien Montella
Mirella Lapata
Kam-Fai Wong
Jeff Z. Pan
KELM
MU
246
5
0
01 May 2025
Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation
Vaidehi Patil
Yi-Lin Sung
Peter Hase
Jie Peng
Jen-tse Huang
Joey Tianyi Zhou
AAML
MU
287
4
0
01 May 2025
Memorization and Knowledge Injection in Gated LLMs
Xu Pan
Ely Hahami
Zechen Zhang
H. Sompolinsky
KELM
CLL
RALM
166
1
0
30 Apr 2025
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
Zhengfu He
Jiadong Wang
Rui Lin
Xuyang Ge
Wentao Shu
Qiong Tang
J.N. Zhang
Xipeng Qiu
132
0
0
29 Apr 2025
SetKE: Knowledge Editing for Knowledge Elements Overlap
Yifan Wei
Xiaoyan Yu
Ran Song
Hao Peng
Angsheng Li
KELM
102
0
0
29 Apr 2025
Exploring How LLMs Capture and Represent Domain-Specific Knowledge
Mirian Hipolito Garcia
Camille Couturier
Daniel Madrigal Diaz
Ankur Mallick
Anastasios Kyrillidis
Robert Sim
Victor Rühle
Saravan Rajmohan
77
1
0
23 Apr 2025
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
Aviv Bick
Eric P. Xing
Albert Gu
RALM
156
1
0
22 Apr 2025
Exploiting Contextual Knowledge in LLMs through V-usable Information based Layer Enhancement
Xiaowei Yuan
Zhao Yang
Ziyang Huang
Yucheng Wang
Siqi Fan
Yiming Ju
Jun Zhao
Kang Liu
81
0
0
22 Apr 2025
Previous
1
2
3
4
5
6
...
20
21
22
Next