Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.05262
Cited By
v1
v2
v3
v4
v5 (latest)
Locating and Editing Factual Associations in GPT
10 February 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Locating and Editing Factual Associations in GPT"
50 / 1,056 papers shown
Title
Causal Inference with Large Language Model: A Survey
Jing Ma
CML
LRM
260
9
0
15 Sep 2024
Prevailing Research Areas for Music AI in the Era of Foundation Models
Megan Wei
M. Modrzejewski
Aswin Sivaraman
Dorien Herremans
MedIm
96
2
0
14 Sep 2024
Synthetic continued pretraining
Zitong Yang
Neil Band
Shuangping Li
Emmanuel Candès
Tatsunori Hashimoto
CLL
SyDa
110
16
0
11 Sep 2024
Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts
Anna Mészáros
Szilvia Ujváry
Wieland Brendel
Patrik Reizinger
Ferenc Huszár
105
0
0
09 Sep 2024
OneEdit: A Neural-Symbolic Collaboratively Knowledge Editing System
Xin Xu
Zekun Xi
Yujie Luo
Peng Wang
Bozhong Tian
...
Lei Liang
Qing Cui
Xiaowei Zhu
Jun Zhou
Huajun Chen
KELM
99
7
0
09 Sep 2024
Representational Analysis of Binding in Language Models
Qin Dai
Benjamin Heinzerling
Kentaro Inui
61
0
0
09 Sep 2024
Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small
Maheep Chaudhary
Atticus Geiger
94
19
0
05 Sep 2024
Attend First, Consolidate Later: On the Importance of Attention in Different LLM Layers
Amit Ben Artzy
Roy Schwartz
66
11
0
05 Sep 2024
Interpreting and Improving Large Language Models in Arithmetic Calculation
Wei Zhang
Chaoqun Wan
Yonggang Zhang
Yiu-ming Cheung
Xinmei Tian
Xu Shen
Jieping Ye
LRM
114
22
0
03 Sep 2024
Does Knowledge Localization Hold True? Surprising Differences Between Entity and Relation Perspectives in Language Models
Yifan Wei
Xiaoyan Yu
Yixuan Weng
Huanhuan Ma
Yuanzhe Zhang
Jun Zhao
Kang Liu
KELM
98
5
0
01 Sep 2024
Modularity in Transformers: Investigating Neuron Separability & Specialization
Nicholas Pochinkov
Thomas Jones
Mohammed Rashidur Rahman
58
0
0
30 Aug 2024
Novel-WD: Exploring acquisition of Novel World Knowledge in LLMs Using Prefix-Tuning
Maxime Méloux
Christophe Cerisara
KELM
CLL
91
0
0
30 Aug 2024
How Reliable are Causal Probing Interventions?
Marc E. Canby
Adam Davies
Chirag Rastogi
Julia Hockenmaier
69
0
0
28 Aug 2024
Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language Models
Xiyu Liu
Zhengxiao Liu
Naibin Gu
Zheng Lin
Wanli Ma
Ji Xiang
Weiping Wang
KELM
105
2
0
27 Aug 2024
Can Transformers Do Enumerative Geometry?
Baran Hashemi
Roderic G. Corominas
Alessandro Giacchetto
542
5
0
27 Aug 2024
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models
Yige Li
Hanxun Huang
Yunhan Zhao
Xingjun Ma
Jun Sun
AAML
SILM
113
19
0
23 Aug 2024
Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience
Zhonghao He
Jascha Achterberg
Katie Collins
Kevin K. Nejad
Danyal Akarca
...
Chole Li
Kai J. Sandbrink
Stephen Casper
Anna Ivanova
Grace W. Lindsay
AI4CE
110
2
0
22 Aug 2024
Enhancing Multi-hop Reasoning through Knowledge Erasure in Large Language Model Editing
Mengqi Zhang
Bowen Fang
Qiang Liu
Fajie Yuan
Shu Wu
Zhumin Chen
Liang Wang
KELM
77
6
0
22 Aug 2024
EEG-Defender: Defending against Jailbreak through Early Exit Generation of Large Language Models
Chongwen Zhao
Zhihao Dou
Kaizhu Huang
AAML
69
3
0
21 Aug 2024
Personality Alignment of Large Language Models
Minjun Zhu
Linyi Yang
Yue Zhang
Yue Zhang
ALM
134
8
0
21 Aug 2024
MEGen: Generative Backdoor in Large Language Models via Model Editing
Jiyang Qiu
Xinbei Ma
Zhuosheng Zhang
Hai Zhao
AAML
KELM
SILM
84
5
0
20 Aug 2024
Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs
Maxim Ifergan
Leshem Choshen
Roee Aharoni
Idan Szpektor
Omri Abend
HILM
101
4
0
20 Aug 2024
KAN 2.0: Kolmogorov-Arnold Networks Meet Science
Ziming Liu
Pingchuan Ma
Yixuan Wang
Wojciech Matusik
Max Tegmark
129
75
0
19 Aug 2024
Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit
Qizhou Chen
Taolin Zhang
Chengyu Wang
Xiaofeng He
Dakan Wang
Tingting Liu
KELM
186
4
0
19 Aug 2024
ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA
Jiaang Li
Quan Wang
Zhongnan Wang
Yongdong Zhang
Zhendong Mao
CLL
KELM
92
0
0
19 Aug 2024
Activated Parameter Locating via Causal Intervention for Model Merging
Fanshuang Kong
Richong Zhang
Ziqiao Wang
MoMe
54
2
0
18 Aug 2024
Lower Layers Matter: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused
Dingwei Chen
Feiteng Fang
Shiwen Ni
Feng Liang
Xiping Hu
A. Argha
Hamid Alinejad-Rokny
Min Yang
Chengming Li
HILM
66
4
0
16 Aug 2024
Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic Inference
Geonhee Kim
Marco Valentino
André Freitas
LRM
AI4CE
108
11
0
16 Aug 2024
Knowledge in Superposition: Unveiling the Failures of Lifelong Knowledge Editing for Large Language Models
Chenhui Hu
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
KELM
145
3
0
14 Aug 2024
Generalisation First, Memorisation Second? Memorisation Localisation for Natural Language Classification Tasks
Verna Dankers
Ivan Titov
91
5
0
09 Aug 2024
UNLEARN Efficient Removal of Knowledge in Large Language Models
Tyler Lizzo
Larry Heck
KELM
MoMe
MU
69
1
0
08 Aug 2024
KnowPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models
Ruizhe Zhang
Yongxin Xu
Yuzhen Xiao
Runchuan Zhu
Xinke Jiang
Xu Chu
Junfeng Zhao
Yasha Wang
80
4
0
06 Aug 2024
Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons
Yifei Wang
Yuheng Chen
Wanting Wen
Yu Sheng
Linjing Li
D. Zeng
KELM
105
9
0
06 Aug 2024
The Mechanics of Conceptual Interpretation in GPT Models: Interpretative Insights
Nura Aljaafari
Danilo S. Carvalho
André Freitas
KELM
62
0
0
05 Aug 2024
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Aaron Mueller
Jannik Brinkmann
Millicent Li
Samuel Marks
Koyena Pal
...
Arnab Sen Sharma
Jiuding Sun
Eric Todd
David Bau
Yonatan Belinkov
CML
132
25
0
02 Aug 2024
An Encoding--Searching Separation Perspective on Bi-Encoder Neural Search
Danbinaerin Han
Akiko Aizawa
Sihun Lee
69
0
0
02 Aug 2024
Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment
Sangwon Yu
Jongyoon Song
Bongkyu Hwang
Hoyoung Kang
Sooah Cho
Junhwa Choi
Seongho Joe
Taehee Lee
Youngjune Gwon
Sungroh Yoon
237
6
0
31 Jul 2024
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
Sara Abdali
Jia He
C. Barberan
Richard Anarfi
96
7
0
30 Jul 2024
Machine Unlearning in Generative AI: A Survey
Zheyuan Liu
Guangyao Dou
Zhaoxuan Tan
Yijun Tian
Meng Jiang
MU
109
19
0
30 Jul 2024
Can Editing LLMs Inject Harm?
Canyu Chen
Baixiang Huang
Zekun Li
Zhaorun Chen
Shiyang Lai
...
Xifeng Yan
William Wang
Philip Torr
Dawn Song
Kai Shu
KELM
157
15
0
29 Jul 2024
Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability
Jorge García-Carrasco
A. Maté
Juan Trujillo
AAML
80
4
0
29 Jul 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon
Roi Reichart
138
16
0
27 Jul 2024
Demystifying Verbatim Memorization in Large Language Models
Jing Huang
Diyi Yang
Christopher Potts
ELM
PILM
MU
145
28
0
25 Jul 2024
Model editing for distribution shifts in uranium oxide morphological analysis
Davis Brown
Cody Nizinski
Madelyn Shapiro
Corey Fallon
Tianzhixi Yin
Henry Kvinge
Jonathan Tu
93
0
0
22 Jul 2024
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Meng Wang
Yunzhi Yao
Ziwen Xu
Shuofei Qiao
Shumin Deng
...
Yong Jiang
Pengjun Xie
Fei Huang
Huajun Chen
Ningyu Zhang
145
39
0
22 Jul 2024
Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis
Guang-Da Liu
Haitao Mao
Jiliang Tang
K. Johnson
LRM
100
8
0
21 Jul 2024
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
Sarah Wiegreffe
Oyvind Tafjord
Yonatan Belinkov
Hanna Hajishirzi
Ashish Sabharwal
94
9
0
21 Jul 2024
LeKUBE: A Legal Knowledge Update BEnchmark
Changyue Wang
Weihang Su
Yiran Hu
Qingyao Ai
Yueyue Wu
Cheng Luo
Yiqun Liu
Min Zhang
Shaoping Ma
AILaw
ELM
85
6
0
19 Jul 2024
Investigating the Indirect Object Identification circuit in Mamba
Danielle Ensign
Adrià Garriga-Alonso
Mamba
83
0
0
19 Jul 2024
Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data
Charles Jin
Martin Rinard
92
1
0
18 Jul 2024
Previous
1
2
3
...
9
10
11
...
20
21
22
Next