Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.05262
Cited By
v1
v2
v3
v4
v5 (latest)
Locating and Editing Factual Associations in GPT
10 February 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Locating and Editing Factual Associations in GPT"
50 / 1,056 papers shown
Title
RECKONING: Reasoning through Dynamic Knowledge Encoding
Zeming Chen
Gail Weiss
E. Mitchell
Asli Celikyilmaz
Antoine Bosselut
KELM
LRM
108
13
0
10 May 2023
Coherent Wave Dynamics and Language Generation of a Generative Pre-trained Transformer
Tao Hong
36
0
0
08 May 2023
Chain-of-Skills: A Configurable Model for Open-domain Question Answering
Kaixin Ma
Hao Cheng
Yu Zhang
Xiaodong Liu
Eric Nyberg
Jianfeng Gao
LRM
81
16
0
04 May 2023
ReMask: A Robust Information-Masking Approach for Domain Counterfactual Generation
Pengfei Hong
Rishabh Bhardwaj
Navonil Majumdar
Somak Aditya
Soujanya Poria
AAML
62
0
0
04 May 2023
Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge
Yasumasa Onoe
Michael J.Q. Zhang
Shankar Padmanabhan
Greg Durrett
Eunsol Choi
KELM
275
78
0
02 May 2023
Key-Locked Rank One Editing for Text-to-Image Personalization
Yoad Tewel
Rinon Gal
Gal Chechik
Yuval Atzmon
DiffM
262
174
0
02 May 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
290
218
0
02 May 2023
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
338
132
0
30 Apr 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability
Arthur Conmy
Augustine N. Mavor-Parker
Aengus Lynch
Stefan Heimersheim
Adrià Garriga-Alonso
72
319
0
28 Apr 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
301
324
0
28 Apr 2023
Label-Free Concept Bottleneck Models
Tuomas P. Oikarinen
Subhro Das
Lam M. Nguyen
Tsui-Wei Weng
101
180
0
12 Apr 2023
Localizing Model Behavior with Path Patching
Nicholas W. Goldowsky-Dill
Chris MacLeod
L. Sato
Aryaman Arora
124
95
0
12 Apr 2023
Inspecting and Editing Knowledge Representations in Language Models
Evan Hernandez
Belinda Z. Li
Jacob Andreas
KELM
100
91
0
03 Apr 2023
Ablating Concepts in Text-to-Image Diffusion Models
Nupur Kumari
Bin Zhang
Sheng-Yu Wang
Eli Shechtman
Richard Y. Zhang
Jun-Yan Zhu
VLM
75
201
0
23 Mar 2023
Language Model Behavior: A Comprehensive Survey
Tyler A. Chang
Benjamin Bergen
VLM
LRM
LM&MA
120
109
0
20 Mar 2023
Context-faithful Prompting for Large Language Models
Wenxuan Zhou
Sheng Zhang
Hoifung Poon
Muhao Chen
KELM
61
65
0
20 Mar 2023
Editing Implicit Assumptions in Text-to-Image Diffusion Models
Hadas Orgad
Bahjat Kawar
Yonatan Belinkov
DiffM
112
91
0
14 Mar 2023
Erasing Concepts from Diffusion Models
Rohit Gandikota
Joanna Materzyñska
Jaden Fiotto-Kaufman
David Bau
DiffM
138
313
0
13 Mar 2023
Making a Computational Attorney
Dell Zhang
Frank Schilder
Jack G. Conrad
Masoud Makrehchi
David von Rickenbach
Isabelle Moulinier
76
1
0
07 Mar 2023
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Atticus Geiger
Zhengxuan Wu
Christopher Potts
Thomas Icard
Noah D. Goodman
CML
187
112
0
05 Mar 2023
Competence-Based Analysis of Language Models
Adam Davies
Jize Jiang
Chengxiang Zhai
ELM
71
5
0
01 Mar 2023
Edit at your own risk: evaluating the robustness of edited models to distribution shifts
Davis Brown
Charles Godfrey
Cody Nizinski
Jonathan Tu
Henry Kvinge
KELM
85
8
0
28 Feb 2023
Inseq: An Interpretability Toolkit for Sequence Generation Models
Gabriele Sarti
Nils Feldhus
Ludwig Sickert
Oskar van der Wal
Malvina Nissim
Arianna Bisazza
131
70
0
27 Feb 2023
Analyzing And Editing Inner Mechanisms Of Backdoored Language Models
Max Lamparth
Anka Reuel
KELM
88
11
0
24 Feb 2023
Task-Specific Skill Localization in Fine-tuned Language Models
A. Panigrahi
Nikunj Saunshi
Haoyu Zhao
Sanjeev Arora
MoMe
102
75
0
13 Feb 2023
What Matters In The Structured Pruning of Generative Language Models?
Michael Santacroce
Zixin Wen
Yelong Shen
Yuan-Fang Li
91
34
0
07 Feb 2023
Effective Data Augmentation With Diffusion Models
Brandon Trabucco
Kyle Doherty
Max Gurinas
Ruslan Salakhutdinov
DiffM
VLM
125
258
0
07 Feb 2023
Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps
Goro Kobayashi
Tatsuki Kuribayashi
Sho Yokoi
Kentaro Inui
114
16
0
01 Feb 2023
Do Multi-Document Summarization Models Synthesize?
Jay DeYoung
Stephanie C. Martinez
Iain J. Marshall
Byron C. Wallace
100
8
0
31 Jan 2023
Truth Machines: Synthesizing Veracity in AI Language Models
Luke Munn
Liam Magee
Vanicka Arora
SyDa
HILM
44
32
0
28 Jan 2023
Tracr: Compiled Transformers as a Laboratory for Interpretability
David Lindner
János Kramár
Sebastian Farquhar
Matthew Rahtz
Tom McGrath
Vladimir Mikulik
140
75
0
12 Jan 2023
Can Large Language Models Change User Preference Adversarially?
Varshini Subhash
AAML
99
8
0
05 Jan 2023
A Survey on Knowledge-Enhanced Pre-trained Language Models
Chaoqi Zhen
Yanlei Shang
Xiangyu Liu
Yifei Li
Yong Chen
Dell Zhang
VLM
KELM
96
3
0
27 Dec 2022
DialGuide: Aligning Dialogue Model Behavior with Developer Guidelines
Prakhar Gupta
Yang Liu
Di Jin
Behnam Hedayatnia
Spandana Gella
Sijia Liu
P. Lange
Julia Hirschberg
Dilek Z. Hakkani-Tür
113
5
0
20 Dec 2022
DSI++: Updating Transformer Memory with New Documents
Sanket Vaibhav Mehta
Jai Gupta
Yi Tay
Mostafa Dehghani
Vinh Q. Tran
J. Rao
Marc Najork
Emma Strubell
Donald Metzler
CLL
103
46
0
19 Dec 2022
Talking About Large Language Models
Murray Shanahan
AI4CE
134
275
0
07 Dec 2022
Language Models as Agent Models
Jacob Andreas
LLMAG
92
141
0
03 Dec 2022
Convexifying Transformers: Improving optimization and understanding of transformer networks
Tolga Ergen
Behnam Neyshabur
Harsh Mehta
MLT
121
15
0
20 Nov 2022
Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors
Thomas Hartvigsen
S. Sankaranarayanan
Hamid Palangi
Yoon Kim
Marzyeh Ghassemi
KELM
168
177
0
20 Nov 2022
Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks
Stephen Casper
K. Hariharan
Dylan Hadfield-Menell
AAML
91
11
0
18 Nov 2022
DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering
Ella Neeman
Roee Aharoni
Or Honovich
Leshem Choshen
Idan Szpektor
Omri Abend
KELM
CML
107
84
0
10 Nov 2022
On the Domain Adaptation and Generalization of Pretrained Language Models: A Survey
Xu Guo
Han Yu
LM&MA
VLM
145
30
0
06 Nov 2022
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
326
563
0
01 Nov 2022
Causal Analysis of Syntactic Agreement Neurons in Multilingual Language Models
Aaron Mueller
Yudi Xia
Tal Linzen
MILM
113
10
0
25 Oct 2022
A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models
Alessandro Stolfo
Zhijing Jin
Kumar Shridhar
Bernhard Schölkopf
Mrinmaya Sachan
ELM
OOD
LRM
147
66
0
21 Oct 2022
Revision Transformers: Instructing Language Models to Change their Values
Felix Friedrich
Wolfgang Stammer
P. Schramowski
Kristian Kersting
KELM
77
8
0
19 Oct 2022
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
Sachin Kumar
Vidhisha Balachandran
Lucille Njoo
Antonios Anastasopoulos
Yulia Tsvetkov
ELM
192
91
0
14 Oct 2022
Mass-Editing Memory in a Transformer
Kevin Meng
Arnab Sen Sharma
A. Andonian
Yonatan Belinkov
David Bau
KELM
VLM
163
601
0
13 Oct 2022
Improving Data-Efficient Fossil Segmentation via Model Editing
Indu Panigrahi
Ryan Manzuk
A. Maloof
Ruth C. Fong
88
1
0
08 Oct 2022
Learning by Distilling Context
Charles Burton Snell
Dan Klein
Ruiqi Zhong
ReLM
LRM
241
48
0
30 Sep 2022
Previous
1
2
3
...
20
21
22
Next