Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.05262
Cited By
v1
v2
v3
v4
v5 (latest)
Locating and Editing Factual Associations in GPT
10 February 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Locating and Editing Factual Associations in GPT"
50 / 1,056 papers shown
Title
Catastrophic Failure of LLM Unlearning via Quantization
Zhiwei Zhang
Fali Wang
Xiaomin Li
Zongyu Wu
Xianfeng Tang
Hui Liu
Qi He
Wenpeng Yin
Suhang Wang
MU
113
18
0
21 Oct 2024
Identifying Sub-networks in Neural Networks via Functionally Similar Representations
Tian Gao
Amit Dhurandhar
Karthikeyan N. Ramamurthy
Dennis L. Wei
119
0
0
21 Oct 2024
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
Yu Zhao
Alessio Devoto
Giwon Hong
Xiaotang Du
Aryo Pradipta Gema
Hongru Wang
Xuanli He
Kam-Fai Wong
Pasquale Minervini
KELM
LLMSV
143
28
0
21 Oct 2024
Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models
Wei Jie Yeo
Ranjan Satapathy
Erik Cambria
74
2
0
18 Oct 2024
Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion
Denitsa Saynova
Lovisa Hagström
Moa Johansson
Richard Johansson
Marco Kuhlmann
HILM
133
1
0
18 Oct 2024
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo
Druv Pai
Yu Bai
Jiantao Jiao
Michael I. Jordan
Song Mei
84
14
0
17 Oct 2024
Looking Inward: Language Models Can Learn About Themselves by Introspection
Felix J Binder
James Chua
Tomek Korbak
Henry Sleight
John Hughes
Robert Long
Ethan Perez
Miles Turpin
Owain Evans
KELM
AIFin
LRM
100
17
0
17 Oct 2024
Seeing Through VisualBERT: A Causal Adventure on Memetic Landscapes
Dibyanayan Bandyopadhyay
Mohammed Hasanuzzaman
Asif Ekbal
AAML
55
1
0
17 Oct 2024
Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning
Minseok Choi
C. Park
Dohyun Lee
Jaegul Choo
KELM
MU
60
1
0
17 Oct 2024
On the Role of Attention Heads in Large Language Model Safety
Zhenhong Zhou
Haiyang Yu
Xinghua Zhang
Rongwu Xu
Fei Huang
Kun Wang
Yang Liu
Sihang Li
Yongbin Li
171
10
0
17 Oct 2024
The Geometry of Numerical Reasoning: Language Models Compare Numeric Properties in Linear Subspaces
Ahmed Oumar El-Shangiti
Tatsuya Hiraoka
Hilal AlQuabeh
Benjamin Heinzerling
Kentaro Inui
146
1
0
17 Oct 2024
AERO: Softmax-Only LLMs for Efficient Private Inference
N. Jha
Brandon Reagen
117
5
0
16 Oct 2024
Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention
Weixuan Wang
Minghao Wu
Barry Haddow
Alexandra Birch
LRM
83
5
0
16 Oct 2024
Neuron-based Personality Trait Induction in Large Language Models
Jia Deng
Tianyi Tang
Yanbin Yin
Wenhao Yang
Wayne Xin Zhao
Ji-Rong Wen
92
1
0
16 Oct 2024
SoK: Prompt Hacking of Large Language Models
Baha Rababah
Shang
Wu
Matthew Kwiatkowski
Carson Leung
Cuneyt Gurcan Akcora
AAML
60
3
0
16 Oct 2024
Reconstruction of Differentially Private Text Sanitization via Large Language Models
Shuchao Pang
Zhigang Lu
Haoran Wang
Peng Fu
Yongbin Zhou
Minhui Xue
AAML
142
5
0
16 Oct 2024
Interpreting token compositionality in LLMs: A robustness analysis
Nura Aljaafari
Danilo S. Carvalho
André Freitas
142
3
0
16 Oct 2024
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models
Shicheng Xu
Liang Pang
Yunchang Zhu
Huawei Shen
Xueqi Cheng
MLLM
129
3
0
16 Oct 2024
The Persian Rug: solving toy models of superposition using large-scale symmetries
Aditya Cowsik
Kfir Dolev
Alex Infanger
44
0
0
15 Oct 2024
O-Edit: Orthogonal Subspace Editing for Language Model Sequential Editing
Yuchen Cai
Ding Cao
KELM
87
3
0
15 Oct 2024
A Theoretical Survey on Foundation Models
Shi Fu
Yuzhu Chen
Yingjie Wang
Dacheng Tao
90
0
0
15 Oct 2024
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability
Zhongxiang Sun
Xiaoxue Zang
Kai Zheng
Yang Song
Jun Xu
Xiao Zhang
Weijie Yu
Yang Song
Han Li
146
17
0
15 Oct 2024
Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study
Yekun Ke
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
103
3
0
15 Oct 2024
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
Litu Rout
Yujia Chen
Nataniel Ruiz
Constantine Caramanis
Sanjay Shakkottai
Wen-Sheng Chu
DiffM
104
0
0
14 Oct 2024
Locking Down the Finetuned LLMs Safety
Minjun Zhu
Linyi Yang
Yifan Wei
Ningyu Zhang
Yue Zhang
108
14
0
14 Oct 2024
Parenting: Optimizing Knowledge Selection of Retrieval-Augmented Language Models with Parameter Decoupling and Tailored Tuning
Yongxin Xu
Ruizhe Zhang
Xinke Jiang
Yujie Feng
Yuzhen Xiao
Xinyu Ma
Runchuan Zhu
Xu Chu
Junfeng Zhao
Yasha Wang
KELM
107
4
0
14 Oct 2024
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
Guorui Zheng
Xidong Wang
Juhao Liang
Nuo Chen
Yuping Zheng
Benyou Wang
MoE
136
5
0
14 Oct 2024
Safety-Aware Fine-Tuning of Large Language Models
Hyeong Kyu Choi
Xuefeng Du
Yixuan Li
101
19
0
13 Oct 2024
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
Yein Park
Chanwoong Yoon
Jungwoo Park
Donghyeon Lee
Minbyul Jeong
Jaewoo Kang
KELM
154
2
0
13 Oct 2024
CollabEdit: Towards Non-destructive Collaborative Knowledge Editing
Jiamu Zheng
Jinghuai Zhang
Tianyu Du
Xuhong Zhang
Jianwei Yin
Tao Lin
KELM
265
0
0
12 Oct 2024
Inference and Verbalization Functions During In-Context Learning
Junyi Tao
Xiaoyin Chen
Nelson F. Liu
LRM
ReLM
94
1
0
12 Oct 2024
Keys to Robust Edits: from Theoretical Insights to Practical Advances
Jianhao Yan
Futing Wang
Yun Luo
Yafu Li
Yue Zhang
KELM
92
0
0
12 Oct 2024
Understanding the Interplay between Parametric and Contextual Knowledge for Large Language Models
Sitao Cheng
Liangming Pan
Xunjian Yin
Xinyi Wang
William Yang Wang
KELM
83
4
0
10 Oct 2024
Mitigating Gender Bias in Code Large Language Models via Model Editing
Zhan Qin
Haochuan Wang
Zecheng Wang
Deyuan Liu
Cunhang Fan
Zhao Lv
Zhiying Tu
Dianhui Chu
Dianbo Sui
KELM
89
2
0
10 Oct 2024
The Geometry of Concepts: Sparse Autoencoder Feature Structure
Yuxiao Li
Eric J. Michaud
David D. Baek
Joshua Engels
Xiaoqing Sun
Max Tegmark
119
21
0
10 Oct 2024
Uncovering Overfitting in Large Language Model Editing
Mengqi Zhang
Xiaotian Ye
Qiang Liu
Fajie Yuan
Shu Wu
Zhumin Chen
KELM
80
16
0
10 Oct 2024
Unlearning-based Neural Interpretations
Ching Lam Choi
Alexandre Duplessis
Serge Belongie
FAtt
268
0
0
10 Oct 2024
Mitigating the Language Mismatch and Repetition Issues in LLM-based Machine Translation via Model Editing
Weichuan Wang
Zhaoyi Li
Defu Lian
Chen Ma
Linqi Song
Ying Wei
103
8
0
09 Oct 2024
Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures
Junxuan Wang
Xuyang Ge
Wentao Shu
Qiong Tang
Yunhua Zhou
Zhengfu He
Xipeng Qiu
89
7
0
09 Oct 2024
Dissecting Fine-Tuning Unlearning in Large Language Models
Yihuai Hong
Yuelin Zou
Lijie Hu
Huiping Zhuang
Di Wang
Haiqin Yang
AAML
MU
77
4
0
09 Oct 2024
On the Similarity of Circuits across Languages: a Case Study on the Subject-verb Agreement Task
Javier Ferrando
Marta R. Costa-jussá
62
7
0
09 Oct 2024
Towards Interpreting Visual Information Processing in Vision-Language Models
Clement Neo
Luke Ong
Philip Torr
Mor Geva
David M. Krueger
Fazl Barez
144
15
0
09 Oct 2024
Jet Expansions of Residual Computation
Yihong Chen
Xiangxiang Xu
Yao Lu
Pontus Stenetorp
Luca Franceschi
98
3
0
08 Oct 2024
Probing Language Models on Their Knowledge Source
Zineddine Tighidet
Andrea Mogini
Jiali Mei
Benjamin Piwowarski
Patrick Gallinari
KELM
87
1
0
08 Oct 2024
From Tokens to Words: On the Inner Lexicon of LLMs
Guy Kaplan
Matanel Oren
Yuval Reif
Roy Schwartz
121
14
0
08 Oct 2024
Locate-then-edit for Multi-hop Factual Recall under Knowledge Editing
Zhuoran Zhang
Yongqian Li
Zijian Kan
Keyuan Cheng
Lijie Hu
Di Wang
KELM
83
13
0
08 Oct 2024
Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification
Tao Meng
Ninareh Mehrabi
Palash Goyal
Anil Ramakrishna
Aram Galstyan
Richard Zemel
Kai-Wei Chang
Rahul Gupta
Charith Peris
28
1
0
07 Oct 2024
Deciphering the Interplay of Parametric and Non-parametric Memory in Retrieval-augmented Language Models
M. Farahani
Richard Johansson
RALM
100
2
0
07 Oct 2024
Mechanistic?
Naomi Saphra
Sarah Wiegreffe
AI4CE
80
13
0
07 Oct 2024
FAME: Towards Factual Multi-Task Model Editing
Li Zeng
Yingyu Shan
Zeming Liu
Jiashu Yao
Yuhang Guo
KELM
51
2
0
07 Oct 2024
Previous
1
2
3
...
7
8
9
...
20
21
22
Next