Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.05262
Cited By
v1
v2
v3
v4
v5 (latest)
Locating and Editing Factual Associations in GPT
10 February 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Locating and Editing Factual Associations in GPT"
50 / 1,056 papers shown
Title
Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis
Xiang Wang
Yan Hu
Wenyu Du
Reynold Cheng
Benyou Wang
Difan Zou
167
3
0
17 Feb 2025
Hyper-SET: Designing Transformers via Hyperspherical Energy Minimization
Yunzhe Hu
Difan Zou
Dong Xu
165
1
0
17 Feb 2025
Precise Parameter Localization for Textual Generation in Diffusion Models
Łukasz Staniszewski
Bartosz Cywiński
Franziska Boenisch
Kamil Deja
Adam Dziedzic
DiffM
482
1
0
17 Feb 2025
DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing
Yi Wang
Fenghua Weng
Shangshang Yang
Zhan Qin
Minlie Huang
Wenjie Wang
KELM
AAML
123
1
0
17 Feb 2025
Exploring Translation Mechanism of Large Language Models
Hongbin Zhang
Kehai Chen
Xuefeng Bai
Xiucheng Li
Yang Xiang
Min Zhang
149
1
0
17 Feb 2025
Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning
Lefei Zhang
Lijie Hu
Di Wang
LRM
212
5
0
17 Feb 2025
Sparse Autoencoder Features for Classifications and Transferability
Jack Gallifant
Shan Chen
Kuleen Sasse
Hugo J. W. L. Aerts
Thomas Hartvigsen
Danielle S. Bitterman
103
6
0
17 Feb 2025
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
Yixin Ou
Yunzhi Yao
N. Zhang
Hui Jin
Jiacheng Sun
Shumin Deng
Hao Sun
Ningyu Zhang
KELM
CLL
128
2
0
16 Feb 2025
Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation
Hieu Nguyen
Zihao He
Shoumik Atul Gandre
Ujjwal Pasupulety
Sharanya Kumari Shivakumar
Kristina Lerman
HILM
138
2
0
16 Feb 2025
Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment
Somnath Banerjee
Sayan Layek
Pratyush Chatterjee
Animesh Mukherjee
Rima Hazra
LLMSV
159
1
0
16 Feb 2025
Unlocking the Power of Function Vectors for Characterizing and Mitigating Catastrophic Forgetting in Continual Instruction Tuning
Gangwei Jiang
Caigao Jiang
Zhaoyi Li
Siqiao Xue
Jun-ping Zhou
Linqi Song
Defu Lian
Yin Wei
CLL
MU
170
2
0
16 Feb 2025
The underlying structures of self-attention: symmetry, directionality, and emergent dynamics in Transformer training
Matteo Saponati
Pascal Sager
Pau Vilimelis Aceituno
Thilo Stadelmann
Benjamin Grewe
41
1
0
15 Feb 2025
Superpose Singular Features for Model Merging
Haiquan Qiu
You Wu
Quanming Yao
MoMe
175
0
0
15 Feb 2025
1bit-Merging: Dynamic Quantized Merging for Large Language Models
Shuqi Liu
Yuxuan Yao
Bowei He
Zehua Liu
Xiongwei Han
Mingxuan Yuan
Han Wu
Linqi Song
MoMe
MQ
155
2
0
15 Feb 2025
LUNAR: LLM Unlearning via Neural Activation Redirection
William F. Shen
Xinchi Qiu
Meghdad Kurmanji
Alex Iacob
Lorenzo Sani
Yihong Chen
Nicola Cancedda
Nicholas D. Lane
MU
132
6
0
11 Feb 2025
MEMIT-Merge: Addressing MEMIT's Key-Value Conflicts in Same-Subject Batch Editing for LLMs
Zilu Dong
Xiangqing Shen
Rui Xia
KELM
161
1
0
11 Feb 2025
Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models
Samuel Stevens
Wei-Lun Chao
T. Berger-Wolf
Yu-Chuan Su
VLM
152
6
0
10 Feb 2025
Reinforced Lifelong Editing for Language Models
Zherui Li
Houcheng Jiang
Hao Chen
Baolong Bi
Zhenhong Zhou
Fei Sun
Sihang Li
Xinze Wang
KELM
165
8
0
09 Feb 2025
AnyEdit: Edit Any Knowledge Encoded in Language Models
Houcheng Jiang
Sihang Li
Ningyu Zhang
Guojun Ma
Mingyang Wan
Xiang Wang
Xiangnan He
Tat-Seng Chua
KELM
150
19
0
08 Feb 2025
Learning Task Representations from In-Context Learning
Baturay Saglam
Zhuoran Yang
Dionysis Kalogerias
Amin Karbasi
133
2
0
08 Feb 2025
Mechanistic Interpretability of Emotion Inference in Large Language Models
Ala Nekouvaght Tak
Amin Banayeeanzade
Anahita Bolourani
Mina Kian
Robin Jia
Jonathan Gratch
110
0
0
08 Feb 2025
IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning Templates
Aissatou Diallo
Antonis Bikakis
Luke Dickens
Anthony Hunter
Rob Miller
LRM
111
0
0
05 Feb 2025
What is a Number, That a Large Language Model May Know It?
Raja Marjieh
Veniamin Veselovsky
Thomas Griffiths
Ilia Sucholutsky
457
3
0
03 Feb 2025
On The Truthfulness of 'Surprisingly Likely' Responses of Large Language Models
Naman Goel
HILM
134
0
0
28 Jan 2025
Evolutionary Optimization of Model Merging Recipes
Takuya Akiba
Makoto Shing
Yujin Tang
Qi Sun
David Ha
MoMe
304
126
0
28 Jan 2025
Risk-Aware Distributional Intervention Policies for Language Models
Bao Nguyen
Binh Nguyen
Duy Nguyen
V. Nguyen
127
2
0
28 Jan 2025
Unraveling Token Prediction Refinement and Identifying Essential Layers in Language Models
Jaturong Kongmanee
89
1
0
25 Jan 2025
Understanding and Mitigating Gender Bias in LLMs via Interpretable Neuron Editing
Zeping Yu
Sophia Ananiadou
KELM
123
3
0
24 Jan 2025
LLMs as Repositories of Factual Knowledge: Limitations and Solutions
Seyed Mahed Mousavi
Simone Alghisi
Giuseppe Riccardi
KELM
110
1
0
22 Jan 2025
Episodic Memories Generation and Evaluation Benchmark for Large Language Models
Alexis Huet
Zied Ben-Houidi
Dario Rossi
LLMAG
90
2
0
21 Jan 2025
Episodic memory in AI agents poses risks that should be studied and mitigated
Chad DeChant
143
4
0
20 Jan 2025
Enhancing Semantic Consistency of Large Language Models through Model Editing: An Interpretability-Oriented Approach
J. Yang
Dapeng Chen
Yajing Sun
Rongjun Li
Zhiyong Feng
Wei Peng
131
8
0
19 Jan 2025
SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment
Yuchun Fan
Yongyu Mu
Yilin Wang
Lei Huang
Junhao Ruan
Yangqiu Song
Tong Xiao
Shujian Huang
Xiaocheng Feng
Jingbo Zhu
LRM
98
8
0
08 Jan 2025
Foundations of GenIR
Qingyao Ai
Jingtao Zhan
Yang Liu
130
0
0
06 Jan 2025
Reasoning-Oriented and Analogy-Based Methods for Locating and Editing in Zero-Shot Event-Relational Reasoning
Jingyao Tang
Lishuang Li
Liteng Mi
Haiming Wu
Hongbin Lu
KELM
111
0
0
03 Jan 2025
Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models
Yanwen Huang
Yong Zhang
Ning Cheng
Zhitao Li
Shaojun Wang
Jing Xiao
176
0
0
02 Jan 2025
Uncovering Memorization Effect in the Presence of Spurious Correlations
Chenyu You
Haocheng Dai
Yifei Min
Jasjeet Sekhon
S. Joshi
James S. Duncan
167
3
0
01 Jan 2025
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Hanguang Xiao
Feizhong Zhou
Xianglong Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILaw
LM&MA
LRM
166
30
0
31 Dec 2024
Neuron-Level Differentiation of Memorization and Generalization in Large Language Models
Yi-Fu Fu
Yu-Chieh Tu
Ching-Yu Tsai
Yu-Chieh Tu
Tzu-Ling Cheng
...
Yi-Ting Yang
Heng-Yi Liu
Keng-Te Liao
Da-Cheng Juan
Shou-de Lin
110
1
0
24 Dec 2024
Knowledge Editing through Chain-of-Thought
Changyue Wang
Weihang Su
Qingyao Ai
Yang Liu
KELM
110
5
0
23 Dec 2024
Joint Knowledge Editing for Information Enrichment and Probability Promotion
Wenhang Shi
Yiren Chen
Shuqing Bian
Xinyi Zhang
Zhe Zhao
Pengfei Hu
Wei Lu
Xiaoyong Du
KELM
87
1
0
22 Dec 2024
A Reality Check on Context Utilisation for Retrieval-Augmented Generation
Lovisa Hagström
Sara Vera Marjanović
Haeun Yu
Arnav Arora
Christina Lioma
Maria Maistro
Pepa Atanasova
Isabelle Augenstein
262
1
0
22 Dec 2024
Privacy in Fine-tuning Large Language Models: Attacks, Defenses, and Future Directions
Hao Du
Shang Liu
Lele Zheng
Yang Cao
Atsuyoshi Nakamura
Lei Chen
AAML
213
5
0
21 Dec 2024
Knowledge Editing with Dynamic Knowledge Graphs for Multi-Hop Question Answering
Yaojie Lu
Yimiao Zhou
Junlin Li
Yanjie Wang
Xuebo Liu
Daojing He
Fengyuan Liu
Min Zhang
KELM
124
3
0
18 Dec 2024
Context-DPO: Aligning Language Models for Context-Faithfulness
Baolong Bi
Shaohan Huang
Yansen Wang
Tianchi Yang
Zihan Zhang
...
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
Shenghua Liu
153
17
0
18 Dec 2024
Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing
Keltin Grimes
Marco Christiani
David Shriver
Marissa Connor
KELM
129
4
0
17 Dec 2024
Rethinking Associative Memory Mechanism in Induction Head
Shuo Wang
Issei Sato
189
0
0
16 Dec 2024
TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs
Lanxiang Hu
Tajana Rosing
Hao Zhang
115
0
0
15 Dec 2024
Analyzing the Attention Heads for Pronoun Disambiguation in Context-aware Machine Translation Models
Paweł Mąka
Yusuf Can Semerci
Jan Scholtes
Gerasimos Spanakis
122
0
0
15 Dec 2024
DECOR:Decomposition and Projection of Text Embeddings for Text-to-Image Customization
Geonhui Jang
Jin-Hwa Kim
Yong-Hyun Park
Junho Kim
Gayoung Lee
Yonghyun Jeong
DiffM
132
0
0
12 Dec 2024
Previous
1
2
3
...
5
6
7
...
20
21
22
Next