Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.05262
Cited By
v1
v2
v3
v4
v5 (latest)
Locating and Editing Factual Associations in GPT
10 February 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Locating and Editing Factual Associations in GPT"
50 / 1,056 papers shown
Title
Empowering Many, Biasing a Few: Generalist Credit Scoring through Large Language Models
Duanyu Feng
Yongfu Dai
Jimin Huang
Yifang Zhang
Qianqian Xie
Weiguang Han
Zhengyu Chen
Alejandro Lopez-Lira
Hao Wang
94
12
0
01 Oct 2023
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
Xuansheng Wu
Wenlin Yao
Jianshu Chen
Xiaoman Pan
Xiaoyang Wang
Ninghao Liu
Dong Yu
LRM
97
33
0
30 Sep 2023
RelBERT: Embedding Relations with Language Models
Asahi Ushio
Jose Camacho-Collados
Steven Schockaert
KELM
82
1
0
30 Sep 2023
Medical Foundation Models are Susceptible to Targeted Misinformation Attacks
T. Han
S. Nebelung
Firas Khader
Tian Wang
Gustav Mueller-Franzes
...
Jens Kleesiek
Christoph Haarburger
Keno K. Bressem
Jakob Nikolas Kather
Daniel Truhn
AAML
44
6
0
29 Sep 2023
KLoB: a Benchmark for Assessing Knowledge Locating Methods in Language Models
Yiming Ju
Zheng Zhang
KELM
61
9
0
28 Sep 2023
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Fred Zhang
Neel Nanda
LLMSV
209
115
0
27 Sep 2023
MKRAG: Medical Knowledge Retrieval Augmented Generation for Medical Question Answering
Takuya Higuchi
Shaochen Xu
Avamarie Brueggeman
Zheng Liu
Tianming Liu
Xiang Li
Ninghao Liu
RALM
102
14
0
27 Sep 2023
Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness
Valentin Barriere
Felipe del Rio
Andres Carvallo De Ferari
Carlos Aspillaga
Eugenio Herrera-Berg
Cristian Buc Calderon
DiffM
78
0
0
27 Sep 2023
Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey
Victoria Smith
Ali Shahin Shamsabadi
Carolyn Ashurst
Adrian Weller
PILM
110
27
0
27 Sep 2023
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Mert Yuksekgonul
Varun Chandrasekaran
Erik Jones
Suriya Gunasekar
Ranjita Naik
Hamid Palangi
Ece Kamar
Besmira Nushi
HILM
67
49
0
26 Sep 2023
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
Lorenzo Pacchiardi
A. J. Chan
Sören Mindermann
Ilan Moscovitz
Alexa Y. Pan
Y. Gal
Owain Evans
J. Brauner
LLMAG
HILM
86
54
0
26 Sep 2023
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
115
207
0
26 Sep 2023
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
Zeyuan Allen-Zhu
Yuanzhi Li
KELM
182
159
0
25 Sep 2023
HANS, are you clever? Clever Hans Effect Analysis of Neural Systems
Leonardo Ranaldi
Fabio Massimo Zanzotto
75
3
0
21 Sep 2023
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
Lukas Berglund
Meg Tong
Max Kaufmann
Mikita Balesni
Asa Cooper Stickland
Tomasz Korbak
Owain Evans
LRM
195
279
0
21 Sep 2023
Knowledge Sanitization of Large Language Models
Yoichi Ishibashi
Hidetoshi Shimodaira
KELM
129
25
0
21 Sep 2023
Rigorously Assessing Natural Language Explanations of Neurons
Jing-ling Huang
Atticus Geiger
Karel DÓosterlinck
Zhengxuan Wu
Christopher Potts
MILM
77
29
0
19 Sep 2023
Cross-Lingual Knowledge Editing in Large Language Models
Jiaan Wang
Yunlong Liang
Zengkui Sun
Yu Cao
Jiarong Xu
Fandong Meng
KELM
86
12
0
16 Sep 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao Song
Weixin Wang
Junze Yin
114
29
0
14 Sep 2023
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
Angelica Chen
Ravid Schwartz-Ziv
Kyunghyun Cho
Matthew L. Leavitt
Naomi Saphra
145
74
0
13 Sep 2023
Circuit Breaking: Removing Model Behaviors with Targeted Ablation
Maximilian Li
Xander Davies
Max Nadeau
KELM
MU
81
29
0
12 Sep 2023
Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models
Mansi Sakarvadia
Aswathy Ajith
Arham Khan
Daniel Grzenda
Nathaniel Hudson
André Bauer
Kyle Chard
Ian Foster
KELM
LRM
96
17
0
11 Sep 2023
Neurons in Large Language Models: Dead, N-gram, Positional
Elena Voita
Javier Ferrando
Christoforos Nalmpantis
MILM
166
56
0
09 Sep 2023
FIND: A Function Description Benchmark for Evaluating Interpretability Methods
Sarah Schwettmann
Tamar Rott Shaham
Joanna Materzyñska
Neil Chowdhury
Shuang Li
Jacob Andreas
David Bau
Antonio Torralba
56
22
0
07 Sep 2023
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Yung-Sung Chuang
Yujia Xie
Hongyin Luo
Yoon Kim
James R. Glass
Pengcheng He
HILM
83
167
0
07 Sep 2023
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Yue Zhang
Yafu Li
Leyang Cui
Deng Cai
Lemao Liu
...
Longyue Wang
Anh Tuan Luu
Wei Bi
Freda Shi
Shuming Shi
RALM
LRM
HILM
174
586
0
03 Sep 2023
Explainability for Large Language Models: A Survey
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
D. Yin
Jundong Li
LRM
110
472
0
02 Sep 2023
Emergent Linear Representations in World Models of Self-Supervised Sequence Models
Neel Nanda
Andrew Lee
Martin Wattenberg
FAtt
MILM
125
186
0
02 Sep 2023
Why do universal adversarial attacks work on large language models?: Geometry might be the answer
Varshini Subhash
Anna Bialas
Weiwei Pan
Finale Doshi-Velez
AAML
91
11
0
01 Sep 2023
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP
Vedant Palit
Rohan Pandey
Aryaman Arora
Paul Pu Liang
88
23
0
27 Aug 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Lin Geng Foo
Hossein Rahmani
Jing Liu
286
31
0
27 Aug 2023
Unified Concept Editing in Diffusion Models
Rohit Gandikota
Hadas Orgad
Yonatan Belinkov
Joanna Materzyñska
David Bau
DiffM
110
192
0
25 Aug 2023
Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons
Yuheng Chen
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
KELM
95
49
0
25 Aug 2023
Overcoming Generic Knowledge Loss with Selective Parameter Update
Wenxuan Zhang
Paul Janson
Rahaf Aljundi
Mohamed Elhoseiny
KELM
CLL
122
12
0
23 Aug 2023
Mode Combinability: Exploring Convex Combinations of Permutation Aligned Models
Adrián Csiszárik
M. Kiss
Péter Korösi-Szabó
Márton Muntag
Gergely Papp
D. Varga
MoMe
61
1
0
22 Aug 2023
GradientCoin: A Peer-to-Peer Decentralized Large Language Models
Yeqi Gao
Zhao Song
Junze Yin
89
18
0
21 Aug 2023
Eva-KELLM: A New Benchmark for Evaluating Knowledge Editing of LLMs
Suhang Wu
Minlong Peng
Yue Chen
Jinsong Su
Mingming Sun
KELM
88
39
0
19 Aug 2023
Linearity of Relation Decoding in Transformer Language Models
Evan Hernandez
Arnab Sen Sharma
Tal Haklay
Kevin Meng
Martin Wattenberg
Jacob Andreas
Yonatan Belinkov
David Bau
KELM
92
100
0
17 Aug 2023
PMET: Precise Model Editing in a Transformer
Xiaopeng Li
Shasha Li
Shangwen Wang
Jing Yang
Jun Ma
Jie Yu
KELM
165
135
0
17 Aug 2023
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation
Xinshuo Hu
Dongfang Li
Baotian Hu
Zihao Zheng
Zhenyu Liu
Hao Fei
KELM
MU
96
30
0
16 Aug 2023
EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models
Peng Wang
Ningyu Zhang
Bo Tian
Zekun Xi
Yunzhi Yao
...
Shuyang Cheng
Kangwei Liu
Yuansheng Ni
Guozhou Zheng
Huajun Chen
KELM
89
57
0
14 Aug 2023
Explaining Relation Classification Models with Semantic Extents
Lars Klöser
André Büsgen
Philipp Kohl
Bodo Kraft
Albert Zündorf
37
0
0
04 Aug 2023
Multimodal Neurons in Pretrained Text-Only Transformers
Sarah Schwettmann
Neil Chowdhury
Samuel J. Klein
David Bau
Antonio Torralba
MILM
94
32
0
03 Aug 2023
Dual Governance: The intersection of centralized regulation and crowdsourced safety mechanisms for Generative AI
Avijit Ghosh
D. Lakshmi
51
3
0
02 Aug 2023
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Ka Leong Cheng
Wenpo Song
Zheng Ma
Wenhao Zhu
Zi-Yue Zhu
Jianbing Zhang
CLIP
VLM
67
11
0
02 Aug 2023
The Hydra Effect: Emergent Self-repair in Language Model Computations
Tom McGrath
Matthew Rahtz
János Kramár
Vladimir Mikulik
Shane Legg
MILM
LRM
79
73
0
28 Jul 2023
FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning Pipelines
Matthew Barker
Emma Kallina
D. Ashok
Katherine M. Collins
Ashley Casovan
Adrian Weller
Ameet Talwalkar
Valerie Chen
Umang Bhatt
66
7
0
28 Jul 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
162
535
0
27 Jul 2023
Evaluating the Ripple Effects of Knowledge Editing in Language Models
Roi Cohen
Eden Biran
Ori Yoran
Amir Globerson
Mor Geva
KELM
110
180
0
24 Jul 2023
Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification
Neel Guha
Mayee F. Chen
Kush S. Bhatia
Azalia Mirhoseini
Frederic Sala
Christopher Ré
78
4
0
20 Jul 2023
Previous
1
2
3
...
18
19
20
21
22
Next