Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2012.14913
Cited By
v1
v2 (latest)
Transformer Feed-Forward Layers Are Key-Value Memories
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
29 December 2020
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Transformer Feed-Forward Layers Are Key-Value Memories"
50 / 790 papers shown
Learning without training: The implicit dynamics of in-context learning
Benoit Dherin
Michael Munn
Hanna Mazzawi
Michael Wunder
J. Gonzalvo
ReLM
OffRL
LRM
803
27
0
24 Dec 2025
EtCon: Edit-then-Consolidate for Reliable Knowledge Editing
Ruilin Li
Yibin Wang
Wenhong Zhu
Chenglin Li
J. Zhang
Chenliang Li
Junchi Yan
Jiaqi Wang
KELM
207
0
0
04 Dec 2025
Representation Interventions Enable Lifelong Knowledge Memory Control in LLMs
Xuyuan Liu
Zhengzhang Chen
Xinshuai Dong
Yanchi Liu
Xujiang Zhao
Shengyu Chen
Haoyu Wang
Yujun Yan
Haifeng Chen
KELM
CLL
241
0
0
25 Nov 2025
CrossEarth-Gate: Fisher-Guided Adaptive Tuning Engine for Efficient Adaptation of Cross-Domain Remote Sensing Semantic Segmentation
Shilei Cao
Ziyang Gong
Hehai Lin
Yang Liu
Jiashun Cheng
...
C. Qin
Hong Cheng
Xue Yang
Juepeng Zheng
Haohuan Fu
303
0
0
25 Nov 2025
Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model
Rio Fear
Payel Mukhopadhyay
Michael McCabe
Alberto Bietti
M. Cranmer
LLMSV
AI4CE
567
4
0
25 Nov 2025
Bridging Philosophy and Machine Learning: A Structuralist Framework for Classifying Neural Network Representations
Yildiz Culcu
AI4CE
230
0
0
23 Nov 2025
Exploiting the Experts: Unauthorized Compression in MoE-LLMs
Pinaki Prasad Guha Neogi
Ahmad Mohammadshirazi
Dheeraj Kulshrestha
R. Ramnath
MoE
191
0
0
22 Nov 2025
RoSA: Enhancing Parameter-Efficient Fine-Tuning via RoPE-aware Selective Adaptation in Large Language Models
Dayan Pan
Jingyuan Wang
Yilong Zhou
Jiawei Cheng
Pengyue Jia
Xiangyu Zhao
89
0
0
21 Nov 2025
Beyond Tokens in Language Models: Interpreting Activations through Text Genre Chunks
Éloïse Benito-Rodriguez
Einar Urdshals
Jasmina Nasufi
Nicky Pochinkov
147
1
0
20 Nov 2025
Adaptive Focus Memory for Language Models
Christopher Cruz
KELM
305
0
0
16 Nov 2025
Belief Net: A Filter-Based Framework for Learning Hidden Markov Models from Observations
Reginald Zhiyan Chen
Heng-Sheng Chang
P. Mehta
115
1
0
13 Nov 2025
Beyond Superficial Forgetting: Thorough Unlearning through Knowledge Density Estimation and Block Re-insertion
Feng Guo
Yuntao Wen
Shen Gao
Junshuo Zhang
Shuo Shang
KELM
MU
497
0
0
11 Nov 2025
On the Analogy between Human Brain and LLMs: Spotting Key Neurons in Grammar Perception
Sanaz Saki Norouzi
Mohammad Masjedi
Pascal Hitzler
169
0
0
09 Nov 2025
You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations
Amit Levi
Raz Lapid
Rom Himelstein
Yaniv Nemcovsky
Ravid Shwartz Ziv
A. Mendelson
MQ
203
2
0
09 Nov 2025
Catching Contamination Before Generation: Spectral Kill Switches for Agents
Valentin Noël
145
0
0
08 Nov 2025
Understanding Robustness of Model Editing in Code LLMs: An Empirical Study
Vinaik Chhetri
A.B. Siddique
Umar Farooq
KELM
214
0
0
05 Nov 2025
ExplicitLM: Decoupling Knowledge from Parameters via Explicit Memory Banks
Chengzhang Yu
Zening Lu
Chenyang Zheng
C. Wang
Yiming Zhang
Zhanpeng Jin
KELM
192
0
0
03 Nov 2025
Balancing Knowledge Updates: Toward Unified Modular Editing in LLMs
Jiahao Liu
Zijian Wang
Kuo Zhao
Dong Hu
KELM
188
0
0
31 Oct 2025
The Structure of Relation Decoding Linear Operators in Large Language Models
Miranda Anna Christ
Adrián Csiszárik
Gergely Becsó
D. Varga
174
0
0
30 Oct 2025
Understanding and Enhancing Mamba-Transformer Hybrids for Memory Recall and Language Modeling
Hyunji Lee
Wenhao Yu
Hongming Zhang
Kaixin Ma
J. Kim
Dong Yu
Minjoon Seo
Mamba
262
3
0
30 Oct 2025
A Survey on Unlearning in Large Language Models
Ruichen Qiu
Jiajun Tan
Jiayue Pu
Honglin Wang
Xiao-Shan Gao
Fei Sun
MU
AILaw
PILM
789
2
0
29 Oct 2025
MemEIC: A Step Toward Continual and Compositional Knowledge Editing
Jin Seong
Jiyun Park
Wencke Liermann
Hongseok Choi
Yoonji Nam
Hyun Kim
Soojong Lim
Namhoon Lee
KELM
369
0
0
29 Oct 2025
From Memorization to Reasoning in the Spectrum of Loss Curvature
Jack Merullo
Srihita Vatsavaya
Lucius Bushnaq
Owen Lewis
269
2
0
28 Oct 2025
Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank
Jiayu Liu
Wei Dai
Zhenya Huang
Ning Miao
Enhong Chen
LRM
111
2
0
28 Oct 2025
Emergence of Minimal Circuits for Indirect Object Identification in Attention-Only Transformers
Rabin Adhikari
LRM
101
0
0
28 Oct 2025
From Uniform to Adaptive: General Skip-Block Mechanisms for Efficient PDE Neural Operators
Lei Liu
Zhongyi Yu
Hong Wang
Huanshuo Dong
Haiyang Xin
Hongwei Zhao
B. Li
230
0
0
27 Oct 2025
Probing Neural Combinatorial Optimization Models
Zhiqin Zhang
Yining Ma
Zhiguang Cao
Hoong Chuin Lau
148
2
0
25 Oct 2025
Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs
Jinzhe Liu
Junshu Sun
Shufan Shen
Chenxue Yang
Shuhui Wang
KELM
CLL
430
3
0
25 Oct 2025
Large Language Models as Model Organisms for Human Associative Learning
Camila Kolling
Vy A. Vo
Mariya Toneva
KELM
251
0
0
24 Oct 2025
Model-Aware Tokenizer Transfer
Mykola Haltiuk
Aleksander Smywiński-Pohl
166
3
0
24 Oct 2025
Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples
Shiva Sreeram
Alaa Maalouf
Pratyusha Sharma
Daniela Rus
154
0
0
23 Oct 2025
Restoring Pruned Large Language Models via Lost Component Compensation
Zijian Feng
Hanzhang Zhou
Zixiao Zhu
Tianjiao Li
Jia Jim Deryl Chua
Lee Onn Mak
Gee Wah Ng
Kezhi Mao
212
2
0
22 Oct 2025
Fairness Evaluation and Inference Level Mitigation in LLMs
Afrozah Nadeem
Mark Dras
Usman Naseem
KELM
200
3
0
21 Oct 2025
How Do LLMs Use Their Depth?
Akshat Gupta
Jay Yeung
Gopala Anumanchipalli
Anna Ivanova
145
5
0
21 Oct 2025
DePass: Unified Feature Attributing by Simple Decomposed Forward Pass
Xiangyu Hong
Che Jiang
Kai Tian
Biqing Qi
Youbang Sun
Ning Ding
Bowen Zhou
187
2
0
21 Oct 2025
AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM
Haoyu Huang
Hong Ting Tsang
Jiaxin Bai
Xi Peng
Gong Zhang
Yangqiu Song
RALM
SLR
245
1
0
20 Oct 2025
Contextual Attention Modulation: Towards Efficient Multi-Task Adaptation in Large Language Models
Dayan Pan
Zhaoyang Fu
Jingyuan Wang
Xiao Han
Yue Zhu
Xiangyu Zhao
KELM
CLL
159
1
0
20 Oct 2025
Layer Specialization Underlying Compositional Reasoning in Transformers
Jing Liu
LRM
195
0
0
20 Oct 2025
Atomic Literary Styling: Mechanistic Manipulation of Prose Generation in Neural Language Models
Tsogt-Ochir Enkhbayar
168
0
0
19 Oct 2025
Soundness-Aware Level: A Microscopic Signature that Predicts LLM Reasoning Potential
Xuansheng Wu
Xiaoman Pan
Wenlin Yao
Jianshu Chen
ReLM
LRM
199
0
0
17 Oct 2025
Facts in Stats: Impacts of Pretraining Diversity on Language Model Generalization
Tina Behnia
Puneesh Deora
Christos Thrampoulidis
152
0
0
17 Oct 2025
Emergence of Linear Truth Encodings in Language Models
Shauli Ravfogel
Gilad Yehudai
Tal Linzen
Joan Bruna
A. Bietti
KELM
191
5
0
17 Oct 2025
Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models
Daniil Gurgurov
Josef van Genabith
Simon Ostermann
Simon Ostermann
MoE
258
0
0
15 Oct 2025
Simple Projection Variants Improve ColBERT Performance
Benjamin Clavié
Sean Lee
Rikiya Takehi
Aamir Shakir
Makoto P. Kato
202
2
0
14 Oct 2025
STEAM: A Semantic-Level Knowledge Editing Framework for Large Language Models
Geunyeong Jeong
Juoh Sun
Seonghee Lee
Harksoo Kim
KELM
186
0
0
12 Oct 2025
Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation
Jiaye Li
Baoyou Chen
Hui Li
Zilong Dong
Jingdong Wang
Siyu Zhu
142
0
0
12 Oct 2025
EvoEdit: Evolving Null-space Alignment for Robust and Efficient Knowledge Editing
Sicheng Lyu
Yu Gu
Xinyu Wang
Jerry Huang
Sitao Luan
Yufei Cui
Xiao-Wen Chang
Peng Lu
KELM
119
2
0
11 Oct 2025
Utilizing dynamic sparsity on pretrained DETR
Reza Sedghi
Anand Subramoney
David Kappel
MoE
171
1
0
10 Oct 2025
On the Representations of Entities in Auto-regressive Large Language Models
Victor Morand
Josiane Mothe
Benjamin Piwowarski
154
0
0
10 Oct 2025
Understanding the Effects of Domain Finetuning on LLMs
Eshaan Tanwar
Deepak Nathani
William Yang Wang
Tanmoy Chakraborty
167
1
0
10 Oct 2025
1
2
3
4
...
14
15
16
Next
Page 1 of 16
Page
of 16
Go