Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.05262
Cited By
v1
v2
v3
v4
v5 (latest)
Locating and Editing Factual Associations in GPT
10 February 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Locating and Editing Factual Associations in GPT"
50 / 1,056 papers shown
Title
Deceptive Alignment Monitoring
Andres Carranza
Dhruv Pai
Rylan Schaeffer
Arnuv Tandon
Oluwasanmi Koyejo
76
9
0
20 Jul 2023
Can Neural Network Memorization Be Localized?
Pratyush Maini
Michael C. Mozer
Hanie Sedghi
Zachary Chase Lipton
J. Zico Kolter
Chiyuan Zhang
TDI
80
55
0
18 Jul 2023
Overthinking the Truth: Understanding how Language Models Process False Demonstrations
Danny Halawi
Jean-Stanislas Denain
Jacob Steinhardt
97
59
0
18 Jul 2023
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Tom Lieberum
Matthew Rahtz
János Kramár
Neel Nanda
G. Irving
Rohin Shah
Vladimir Mikulik
103
115
0
18 Jul 2023
Discovering Variable Binding Circuitry with Desiderata
Xander Davies
Max Nadeau
Nikhil Prakash
Tamar Rott Shaham
David Bau
73
15
0
07 Jul 2023
An Overview of Catastrophic AI Risks
Dan Hendrycks
Mantas Mazeika
Thomas Woodside
SILM
86
186
0
21 Jun 2023
Schema-learning and rebinding as mechanisms of in-context learning and emergence
Siva K. Swaminathan
Antoine Dedieu
Rajkumar Vasudeva Raju
Murray Shanahan
Miguel Lazaro-Gredilla
Dileep George
101
14
0
16 Jun 2023
Propagating Knowledge Updates to LMs Through Distillation
Shankar Padmanabhan
Yasumasa Onoe
Michael J.Q. Zhang
Greg Durrett
Eunsol Choi
KELM
100
20
0
15 Jun 2023
Operationalising Representation in Natural Language Processing
J. Harding
121
13
0
14 Jun 2023
Measuring and Modifying Factual Knowledge in Large Language Models
Pouya Pezeshkpour
KELM
70
18
0
09 Jun 2023
Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories
Shizhe Diao
Tianyang Xu
Ruijia Xu
Jiawei Wang
Tong Zhang
MoE
AI4CE
60
41
0
08 Jun 2023
Causal interventions expose implicit situation models for commonsense language understanding
Takateru Yamakoshi
James L. McClelland
A. Goldberg
Robert D. Hawkins
104
6
0
06 Jun 2023
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Kenneth Li
Oam Patel
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
KELM
HILM
195
584
0
06 Jun 2023
Encoding Time-Series Explanations through Self-Supervised Model Behavior Consistency
Owen Queen
Thomas Hartvigsen
Teddy Koker
Huan He
Theodoros Tsiligkaridis
Marinka Zitnik
AI4TS
98
21
0
03 Jun 2023
Learning Transformer Programs
Dan Friedman
Alexander Wettig
Danqi Chen
89
36
0
01 Jun 2023
Birth of a Transformer: A Memory Viewpoint
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
116
96
0
01 Jun 2023
ReFACT: Updating Text-to-Image Models by Editing the Text Encoder
Dana Arad
Hadas Orgad
Yonatan Belinkov
KELM
142
19
0
01 Jun 2023
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey
Chen Ling
Xujiang Zhao
Jiaying Lu
Chengyuan Deng
Can Zheng
...
Chris White
Quanquan Gu
Jian Pei
Carl Yang
Liang Zhao
ALM
169
140
0
30 May 2023
Gaussian Process Probes (GPP) for Uncertainty-Aware Probing
Zehao Wang
Alexander Ku
Jason Baldridge
Thomas Griffiths
Been Kim
UQCV
92
13
0
29 May 2023
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
J. Hoelscher-Obermaier
Julia Persson
Esben Kran
Ioannis Konstas
Fazl Barez
KELM
96
62
0
27 May 2023
Theoretical and Practical Perspectives on what Influence Functions Do
Andrea Schioppa
Katja Filippova
Ivan Titov
Polina Zablotskaia
TDI
65
18
0
26 May 2023
Backpack Language Models
John Hewitt
John Thickstun
Christopher D. Manning
Percy Liang
KELM
103
16
0
26 May 2023
ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models
Yuxin Zhang
Weiming Dong
Fan Tang
Nisha Huang
Haibin Huang
Chongyang Ma
Tong-Yee Lee
Oliver Deussen
Changsheng Xu
DiffM
126
81
0
25 May 2023
Language Models Implement Simple Word2Vec-style Vector Arithmetic
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
KELM
95
66
0
25 May 2023
Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation
Niels Mündler
Jingxuan He
Slobodan Jenko
Martin Vechev
HILM
85
119
0
25 May 2023
Editable Graph Neural Network for Node Classifications
Zirui Liu
Zhimeng Jiang
Shaochen Zhong
Kaixiong Zhou
Li Li
Rui Chen
Soo-Hyun Choi
Helen Zhou
86
6
0
24 May 2023
Referral Augmentation for Zero-Shot Information Retrieval
Michael Tang
Shunyu Yao
John Yang
Karthik Narasimhan
88
3
0
24 May 2023
Meta-Learning Online Adaptation of Language Models
Nathan J. Hu
E. Mitchell
Christopher D. Manning
Chelsea Finn
KELM
99
37
0
24 May 2023
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis
Alessandro Stolfo
Yonatan Belinkov
Mrinmaya Sachan
MILM
KELM
LRM
111
54
0
24 May 2023
Editing Common Sense in Transformers
Anshita Gupta
Debanjan Mondal
Akshay Krishna Sheshadri
Wenlong Zhao
Xiang Lorraine Li
Sarah Wiegreffe
Niket Tandon
KELM
105
30
0
24 May 2023
Mitigating Temporal Misalignment by Discarding Outdated Facts
Michael J.Q. Zhang
Eunsol Choi
KELM
HILM
103
20
0
24 May 2023
MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Zexuan Zhong
Zhengxuan Wu
Christopher D. Manning
Christopher Potts
Danqi Chen
KELM
112
217
0
24 May 2023
Can Transformers Learn to Solve Problems Recursively?
Shizhuo Zhang
Curt Tigges
Stella Biderman
Maxim Raginsky
Talia Ringer
54
17
0
24 May 2023
All Roads Lead to Rome? Exploring the Invariance of Transformers' Representations
Yuxin Ren
Qipeng Guo
Zhijing Jin
Shauli Ravfogel
Mrinmaya Sachan
Bernhard Schölkopf
Ryan Cotterell
77
4
0
23 May 2023
Deduction under Perturbed Evidence: Probing Student Simulation Capabilities of Large Language Models
Shashank Sonkar
Richard G. Baraniuk
33
1
0
23 May 2023
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
Sina J. Semnani
Violet Z. Yao
He Zhang
M. Lam
KELM
AI4MH
118
81
0
23 May 2023
Polyglot or Not? Measuring Multilingual Encyclopedic Knowledge in Foundation Models
Tim Schott
Daniel Furman
Shreshta Bhat
ELM
76
4
0
23 May 2023
The Knowledge Alignment Problem: Bridging Human and External Knowledge for Large Language Models
Shuo Zhang
Liangming Pan
Junzhou Zhao
Wenjie Wang
HILM
62
0
0
23 May 2023
VISIT: Visualizing and Interpreting the Semantic Information Flow of Transformers
Shahar Katz
Yonatan Belinkov
84
28
0
22 May 2023
Can LLMs facilitate interpretation of pre-trained language models?
Basel Mousi
Nadir Durrani
Fahim Dalvi
95
13
0
22 May 2023
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts
Jian Xie
Kai Zhang
Jiangjie Chen
Renze Lou
Yu-Chuan Su
RALM
326
181
0
22 May 2023
LM vs LM: Detecting Factual Errors via Cross Examination
Roi Cohen
May Hamri
Mor Geva
Amir Globerson
HILM
126
144
0
22 May 2023
Editing Large Language Models: Problems, Methods, and Opportunities
Yunzhi Yao
Peng Wang
Bo Tian
Shuyang Cheng
Zhoubo Li
Shumin Deng
Huajun Chen
Ningyu Zhang
KELM
124
314
0
22 May 2023
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
240
614
0
22 May 2023
Can We Edit Factual Knowledge by In-Context Learning?
Ce Zheng
Lei Li
Qingxiu Dong
Yuxuan Fan
Zhiyong Wu
Jingjing Xu
Baobao Chang
KELM
91
217
0
22 May 2023
Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models
Oana Ignat
Zhijing Jin
Artem Abzaliev
Laura Biester
Santiago Castro
...
Verónica Pérez-Rosas
Siqi Shen
Zekun Wang
Winston Wu
Rada Mihalcea
LRM
143
6
0
21 May 2023
Decouple knowledge from parameters for plug-and-play language modeling
Xin Cheng
Yankai Lin
Preslav Nakov
Dongyan Zhao
Rui Yan
KELM
93
2
0
19 May 2023
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Zhengxuan Wu
Atticus Geiger
Thomas Icard
Christopher Potts
Noah D. Goodman
MILM
87
93
0
15 May 2023
Semantic Composition in Visually Grounded Language Models
Rohan Pandey
CoGe
91
1
0
15 May 2023
FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge
Shangbin Feng
Vidhisha Balachandran
Yuyang Bai
Yulia Tsvetkov
KELM
HILM
79
59
0
14 May 2023
Previous
1
2
3
...
19
20
21
22
Next