Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.05262
Cited By
v1
v2
v3
v4
v5 (latest)
Locating and Editing Factual Associations in GPT
10 February 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Locating and Editing Factual Associations in GPT"
50 / 1,056 papers shown
Title
Identifying and Manipulating Personality Traits in LLMs Through Activation Engineering
Rumi A. Allbert
James K. Wiles
Vlad Grankovsky
LLMSV
AI4CE
164
1
0
10 Dec 2024
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
A. Feder Cooper
Christopher A. Choquette-Choo
Miranda Bogen
Matthew Jagielski
Katja Filippova
...
Abigail Z. Jacobs
Andreas Terzis
Hanna M. Wallach
Nicolas Papernot
Katherine Lee
AILaw
MU
184
20
0
09 Dec 2024
Implicit Priors Editing in Stable Diffusion via Targeted Token Adjustment
Feng He
Chao Zhang
Zhixue Zhao
183
0
0
04 Dec 2024
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Yunkai Dang
Kaichen Huang
Jiahao Huo
Yibo Yan
Shijie Huang
...
Kun Wang
Yong Liu
Jing Shao
Hui Xiong
Xuming Hu
LRM
170
22
0
03 Dec 2024
Detecting Memorization in Large Language Models
Eduardo Slonski
90
0
0
02 Dec 2024
Think-to-Talk or Talk-to-Think? When LLMs Come Up with an Answer in Multi-Step Arithmetic Reasoning
Keito Kudo
Yoichi Aoki
Tatsuki Kuribayashi
Shusaku Sone
Masaya Taniguchi
Ana Brassard
Keisuke Sakaguchi
Kentaro Inui
ReLM
LRM
144
1
0
02 Dec 2024
DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models
Shwetha Ram
T. Neiman
Qianli Feng
Andrew Stuart
S. D. Tran
Trishul Chilimbi
141
2
0
28 Nov 2024
Neutralizing Backdoors through Information Conflicts for Large Language Models
Chen Chen
Yuchen Sun
Xueluan Gong
Jiaxin Gao
K. Lam
KELM
AAML
167
0
0
27 Nov 2024
One Mind, Many Tongues: A Deep Dive into Language-Agnostic Knowledge Neurons in Large Language Models
Pengfei Cao
Yuheng Chen
Zhuoran Jin
Yubo Chen
Kang Liu
Jun Zhao
KELM
122
0
0
26 Nov 2024
The Two-Hop Curse: LLMs trained on A
→
\rightarrow
→
B, B
→
\rightarrow
→
C fail to learn A
→
\rightarrow
→
C
Mikita Balesni
Tomek Korbak
Owain Evans
ReLM
LRM
145
0
0
25 Nov 2024
Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
Sohee Yang
Nora Kassner
E. Gribovskaya
Sebastian Riedel
Mor Geva
LRM
KELM
ReLM
190
9
0
25 Nov 2024
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Qizhou Chen
Chengyu Wang
Dakan Wang
Taolin Zhang
Wangyue Li
Xiaofeng He
KELM
161
1
0
23 Nov 2024
Towards Knowledge Checking in Retrieval-augmented Generation: A Representation Perspective
Shenglai Zeng
Jiankun Zhang
Bingheng Li
Yuping Lin
Tianqi Zheng
...
Hui Liu
Hui Liu
Yue Xing
Monica Xiao Cheng
Jiliang Tang
RALM
131
5
0
21 Nov 2024
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Javier Ferrando
Oscar Obeso
Senthooran Rajamanoharan
Neel Nanda
187
33
0
21 Nov 2024
Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models
Zhen Zeng
Leijiang Gu
Xun Yang
Zhangling Duan
Zenglin Shi
Meng Wang
KELM
134
2
0
19 Nov 2024
Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering
Zeping Yu
Sophia Ananiadou
495
2
0
17 Nov 2024
AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment
Y. Fu
Zhongzhi Yu
Junwei Li
Jiayi Qian
Yongan Zhang
Xiangchi Yuan
Dachuan Shi
Roman Yakunin
Y. Lin
98
4
0
15 Nov 2024
Probing LLM Hallucination from Within: Perturbation-Driven Approach via Internal Knowledge
Seongmin Lee
Hsiang Hsu
Chun-Fu Chen
Duen Horng
Chau
LRM
101
2
0
14 Nov 2024
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions
Moran Yanuka
Assaf Ben-Kish
Yonatan Bitton
Idan Szpektor
Raja Giryes
VLM
163
3
0
13 Nov 2024
Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks
Madeline Brumley
Joe Kwon
David M. Krueger
Dmitrii Krasheninnikov
Usman Anwar
LLMSV
98
9
0
11 Nov 2024
Model Editing for LLMs4Code: How Far are We?
Xiaopeng Li
Shasha Li
Huijun Liu
Jun Ma
Jie Yu
Xiaodong Liu
Jing Wang
Shezheng Song
Weimin Zhang
KELM
104
4
0
11 Nov 2024
Controllable Context Sensitivity and the Knob Behind It
Julian Minder
Kevin Du
Niklas Stoehr
Giovanni Monea
Chris Wendler
Robert West
Ryan Cotterell
KELM
154
6
0
11 Nov 2024
Gumbel Counterfactual Generation From Language Models
Shauli Ravfogel
Anej Svete
Vésteinn Snæbjarnarson
Ryan Cotterell
LRM
CML
108
5
0
11 Nov 2024
Continual Memorization of Factoids in Language Models
Howard Chen
Jiayi Geng
Adithya Bhaskar
Dan Friedman
Danqi Chen
KELM
132
1
0
11 Nov 2024
Gradient Localization Improves Lifelong Pretraining of Language Models
Jared Fernandez
Yonatan Bisk
Emma Strubell
KELM
99
2
0
07 Nov 2024
Unlearning in- vs. out-of-distribution data in LLMs under gradient-based method
Teodora Baluta
Pascal Lamblin
Daniel Tarlow
Fabian Pedregosa
Gintare Karolina Dziugaite
MU
73
2
0
07 Nov 2024
A Implies B: Circuit Analysis in LLMs for Propositional Logical Reasoning
Guan Zhe Hong
Nishanth Dikkala
Enming Luo
Cyrus Rashtchian
Xin Wang
Rina Panigrahy
OffRL
LRM
NAI
96
0
0
06 Nov 2024
Extracting Unlearned Information from LLMs with Activation Steering
Atakan Seyitoğlu
A. Kuvshinov
Leo Schwinn
Stephan Günnemann
MU
LLMSV
100
8
0
04 Nov 2024
Learning Where to Edit Vision Transformers
Yunqiao Yang
Long-Kai Huang
Shengzhuang Chen
Kede Ma
Ying Wei
KELM
96
1
0
04 Nov 2024
Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control
Yuxin Xiao
Chaoqun Wan
Yonggang Zhang
Wenxiao Wang
Binbin Lin
Xiaofei He
Xu Shen
Jieping Ye
53
0
0
04 Nov 2024
The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units
Badr AlKhamissi
Greta Tuckute
Antoine Bosselut
Martin Schrimpf
MILM
121
12
0
04 Nov 2024
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Dennis Fucci
Marco Gaido
Beatrice Savoldi
Matteo Negri
Mauro Cettolo
L. Bentivogli
276
3
0
03 Nov 2024
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
Aashiq Muhamed
Mona Diab
Virginia Smith
107
3
0
01 Nov 2024
Commonsense Knowledge Editing Based on Free-Text in LLMs
Xiusheng Huang
Yequan Wang
Jun Zhao
Kang Liu
KELM
79
7
0
31 Oct 2024
Reasons and Solutions for the Decline in Model Performance after Editing
Xiusheng Huang
Jiaxiang Liu
Yequan Wang
Kang Liu
KELM
103
7
0
31 Oct 2024
Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models
Rishabh Adiga
Besmira Nushi
Varun Chandrasekaran
97
1
0
29 Oct 2024
Learning and Unlearning of Fabricated Knowledge in Language Models
Chen Sun
Nolan Miller
A. Zhmoginov
Max Vladymyrov
Mark Sandler
KELM
MU
60
1
0
29 Oct 2024
Survey of User Interface Design and Interaction Techniques in Generative AI Applications
Reuben Luera
Ryan Rossi
Alexa F. Siu
Franck Dernoncourt
Tong Yu
...
Hanieh Salehy
Jian Zhao
Samyadeep Basu
Puneet Mathur
Nedim Lipka
AI4TS
139
1
0
28 Oct 2024
Causal Interventions on Causal Paths: Mapping GPT-2's Reasoning From Syntax to Semantics
Isabelle Lee
Joshua Lum
Ziyi Liu
Dani Yogatama
LRM
66
0
0
28 Oct 2024
NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates
Hexuan Deng
Wenxiang Jiao
Xuebo Liu
Min Zhang
Zhaopeng Tu
106
4
0
28 Oct 2024
Applying sparse autoencoders to unlearn knowledge in language models
Eoin Farrell
Yeu-Tong Lau
Arthur Conmy
MU
109
24
0
25 Oct 2024
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations
Aryo Pradipta Gema
Chen Jin
Ahmed Abdulaal
Tom Diethe
Philip Teare
Beatrice Alex
Pasquale Minervini
Amrutha Saseendran
102
6
0
24 Oct 2024
Delving into the Reversal Curse: How Far Can Large Language Models Generalize?
Zhengkai Lin
Z. Fu
Kai Liu
Liang Xie
Binbin Lin
Wenxiao Wang
D. Cai
Yue Wu
Jieping Ye
LRM
128
3
0
24 Oct 2024
On Explaining with Attention Matrices
Omar Naim
Nicholas Asher
74
1
0
24 Oct 2024
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
120
5
0
24 Oct 2024
Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing
Dongliang Guo
Mengxuan Hu
Zihan Guan
Junfeng Guo
Thomas Hartvigsen
Sheng Li
AAML
140
2
0
23 Oct 2024
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
Jinghan Jia
Jiancheng Liu
Yihua Zhang
Parikshit Ram
Nathalie Baracaldo
Sijia Liu
MU
169
8
0
23 Oct 2024
The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language Models
Chen Qian
Dongrui Liu
Jie Zhang
Yong Liu
Jing Shao
95
1
0
22 Oct 2024
LLMScan: Causal Scan for LLM Misbehavior Detection
Mengdi Zhang
Kai Kiat Goh
Peixin Zhang
Jun Sun
Rose Lin Xin
Hongyu Zhang
163
0
0
22 Oct 2024
A Psycholinguistic Evaluation of Language Models' Sensitivity to Argument Roles
Eun-Kyoung Rosa Lee
Sathvik Nair
Naomi Feldman
112
4
0
21 Oct 2024
Previous
1
2
3
...
6
7
8
...
20
21
22
Next