ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.14913
  4. Cited By
Transformer Feed-Forward Layers Are Key-Value Memories
v1v2 (latest)

Transformer Feed-Forward Layers Are Key-Value Memories

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
29 December 2020
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
    KELM
ArXiv (abs)PDFHTML

Papers citing "Transformer Feed-Forward Layers Are Key-Value Memories"

50 / 790 papers shown
Learning without training: The implicit dynamics of in-context learning
Learning without training: The implicit dynamics of in-context learning
Benoit Dherin
Michael Munn
Hanna Mazzawi
Michael Wunder
J. Gonzalvo
ReLMOffRLLRM
803
27
0
24 Dec 2025
EtCon: Edit-then-Consolidate for Reliable Knowledge Editing
EtCon: Edit-then-Consolidate for Reliable Knowledge Editing
Ruilin Li
Yibin Wang
Wenhong Zhu
Chenglin Li
J. Zhang
Chenliang Li
Junchi Yan
Jiaqi Wang
KELM
207
0
0
04 Dec 2025
Representation Interventions Enable Lifelong Knowledge Memory Control in LLMs
Representation Interventions Enable Lifelong Knowledge Memory Control in LLMs
Xuyuan Liu
Zhengzhang Chen
Xinshuai Dong
Yanchi Liu
Xujiang Zhao
Shengyu Chen
Haoyu Wang
Yujun Yan
Haifeng Chen
KELMCLL
241
0
0
25 Nov 2025
CrossEarth-Gate: Fisher-Guided Adaptive Tuning Engine for Efficient Adaptation of Cross-Domain Remote Sensing Semantic Segmentation
CrossEarth-Gate: Fisher-Guided Adaptive Tuning Engine for Efficient Adaptation of Cross-Domain Remote Sensing Semantic Segmentation
Shilei Cao
Ziyang Gong
Hehai Lin
Yang Liu
Jiashun Cheng
...
C. Qin
Hong Cheng
Xue Yang
Juepeng Zheng
Haohuan Fu
303
0
0
25 Nov 2025
Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model
Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model
Rio Fear
Payel Mukhopadhyay
Michael McCabe
Alberto Bietti
M. Cranmer
LLMSVAI4CE
567
4
0
25 Nov 2025
Bridging Philosophy and Machine Learning: A Structuralist Framework for Classifying Neural Network Representations
Bridging Philosophy and Machine Learning: A Structuralist Framework for Classifying Neural Network Representations
Yildiz Culcu
AI4CE
230
0
0
23 Nov 2025
Exploiting the Experts: Unauthorized Compression in MoE-LLMs
Exploiting the Experts: Unauthorized Compression in MoE-LLMs
Pinaki Prasad Guha Neogi
Ahmad Mohammadshirazi
Dheeraj Kulshrestha
R. Ramnath
MoE
191
0
0
22 Nov 2025
RoSA: Enhancing Parameter-Efficient Fine-Tuning via RoPE-aware Selective Adaptation in Large Language Models
RoSA: Enhancing Parameter-Efficient Fine-Tuning via RoPE-aware Selective Adaptation in Large Language Models
Dayan Pan
Jingyuan Wang
Yilong Zhou
Jiawei Cheng
Pengyue Jia
Xiangyu Zhao
89
0
0
21 Nov 2025
Beyond Tokens in Language Models: Interpreting Activations through Text Genre Chunks
Beyond Tokens in Language Models: Interpreting Activations through Text Genre Chunks
Éloïse Benito-Rodriguez
Einar Urdshals
Jasmina Nasufi
Nicky Pochinkov
147
1
0
20 Nov 2025
Adaptive Focus Memory for Language Models
Adaptive Focus Memory for Language Models
Christopher Cruz
KELM
305
0
0
16 Nov 2025
Belief Net: A Filter-Based Framework for Learning Hidden Markov Models from Observations
Belief Net: A Filter-Based Framework for Learning Hidden Markov Models from Observations
Reginald Zhiyan Chen
Heng-Sheng Chang
P. Mehta
115
1
0
13 Nov 2025
Beyond Superficial Forgetting: Thorough Unlearning through Knowledge Density Estimation and Block Re-insertion
Beyond Superficial Forgetting: Thorough Unlearning through Knowledge Density Estimation and Block Re-insertion
Feng Guo
Yuntao Wen
Shen Gao
Junshuo Zhang
Shuo Shang
KELMMU
497
0
0
11 Nov 2025
On the Analogy between Human Brain and LLMs: Spotting Key Neurons in Grammar Perception
On the Analogy between Human Brain and LLMs: Spotting Key Neurons in Grammar Perception
Sanaz Saki Norouzi
Mohammad Masjedi
Pascal Hitzler
169
0
0
09 Nov 2025
You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations
You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations
Amit Levi
Raz Lapid
Rom Himelstein
Yaniv Nemcovsky
Ravid Shwartz Ziv
A. Mendelson
MQ
203
2
0
09 Nov 2025
Catching Contamination Before Generation: Spectral Kill Switches for Agents
Catching Contamination Before Generation: Spectral Kill Switches for Agents
Valentin Noël
145
0
0
08 Nov 2025
Understanding Robustness of Model Editing in Code LLMs: An Empirical Study
Understanding Robustness of Model Editing in Code LLMs: An Empirical Study
Vinaik Chhetri
A.B. Siddique
Umar Farooq
KELM
214
0
0
05 Nov 2025
ExplicitLM: Decoupling Knowledge from Parameters via Explicit Memory Banks
ExplicitLM: Decoupling Knowledge from Parameters via Explicit Memory Banks
Chengzhang Yu
Zening Lu
Chenyang Zheng
C. Wang
Yiming Zhang
Zhanpeng Jin
KELM
192
0
0
03 Nov 2025
Balancing Knowledge Updates: Toward Unified Modular Editing in LLMs
Balancing Knowledge Updates: Toward Unified Modular Editing in LLMs
Jiahao Liu
Zijian Wang
Kuo Zhao
Dong Hu
KELM
188
0
0
31 Oct 2025
The Structure of Relation Decoding Linear Operators in Large Language Models
The Structure of Relation Decoding Linear Operators in Large Language Models
Miranda Anna Christ
Adrián Csiszárik
Gergely Becsó
D. Varga
174
0
0
30 Oct 2025
Understanding and Enhancing Mamba-Transformer Hybrids for Memory Recall and Language Modeling
Understanding and Enhancing Mamba-Transformer Hybrids for Memory Recall and Language Modeling
Hyunji Lee
Wenhao Yu
Hongming Zhang
Kaixin Ma
J. Kim
Dong Yu
Minjoon Seo
Mamba
262
3
0
30 Oct 2025
A Survey on Unlearning in Large Language Models
A Survey on Unlearning in Large Language Models
Ruichen Qiu
Jiajun Tan
Jiayue Pu
Honglin Wang
Xiao-Shan Gao
Fei Sun
MUAILawPILM
789
2
0
29 Oct 2025
MemEIC: A Step Toward Continual and Compositional Knowledge Editing
MemEIC: A Step Toward Continual and Compositional Knowledge Editing
Jin Seong
Jiyun Park
Wencke Liermann
Hongseok Choi
Yoonji Nam
Hyun Kim
Soojong Lim
Namhoon Lee
KELM
369
0
0
29 Oct 2025
From Memorization to Reasoning in the Spectrum of Loss Curvature
From Memorization to Reasoning in the Spectrum of Loss Curvature
Jack Merullo
Srihita Vatsavaya
Lucius Bushnaq
Owen Lewis
269
2
0
28 Oct 2025
Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank
Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank
Jiayu Liu
Wei Dai
Zhenya Huang
Ning Miao
Enhong Chen
LRM
111
2
0
28 Oct 2025
Emergence of Minimal Circuits for Indirect Object Identification in Attention-Only Transformers
Emergence of Minimal Circuits for Indirect Object Identification in Attention-Only Transformers
Rabin Adhikari
LRM
101
0
0
28 Oct 2025
From Uniform to Adaptive: General Skip-Block Mechanisms for Efficient PDE Neural Operators
From Uniform to Adaptive: General Skip-Block Mechanisms for Efficient PDE Neural Operators
Lei Liu
Zhongyi Yu
Hong Wang
Huanshuo Dong
Haiyang Xin
Hongwei Zhao
B. Li
230
0
0
27 Oct 2025
Probing Neural Combinatorial Optimization Models
Probing Neural Combinatorial Optimization Models
Zhiqin Zhang
Yining Ma
Zhiguang Cao
Hoong Chuin Lau
148
2
0
25 Oct 2025
Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs
Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs
Jinzhe Liu
Junshu Sun
Shufan Shen
Chenxue Yang
Shuhui Wang
KELMCLL
430
3
0
25 Oct 2025
Large Language Models as Model Organisms for Human Associative Learning
Large Language Models as Model Organisms for Human Associative Learning
Camila Kolling
Vy A. Vo
Mariya Toneva
KELM
251
0
0
24 Oct 2025
Model-Aware Tokenizer Transfer
Model-Aware Tokenizer Transfer
Mykola Haltiuk
Aleksander Smywiński-Pohl
166
3
0
24 Oct 2025
Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples
Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples
Shiva Sreeram
Alaa Maalouf
Pratyusha Sharma
Daniela Rus
154
0
0
23 Oct 2025
Restoring Pruned Large Language Models via Lost Component Compensation
Restoring Pruned Large Language Models via Lost Component Compensation
Zijian Feng
Hanzhang Zhou
Zixiao Zhu
Tianjiao Li
Jia Jim Deryl Chua
Lee Onn Mak
Gee Wah Ng
Kezhi Mao
212
2
0
22 Oct 2025
Fairness Evaluation and Inference Level Mitigation in LLMs
Fairness Evaluation and Inference Level Mitigation in LLMs
Afrozah Nadeem
Mark Dras
Usman Naseem
KELM
200
3
0
21 Oct 2025
How Do LLMs Use Their Depth?
How Do LLMs Use Their Depth?
Akshat Gupta
Jay Yeung
Gopala Anumanchipalli
Anna Ivanova
145
5
0
21 Oct 2025
DePass: Unified Feature Attributing by Simple Decomposed Forward Pass
DePass: Unified Feature Attributing by Simple Decomposed Forward Pass
Xiangyu Hong
Che Jiang
Kai Tian
Biqing Qi
Youbang Sun
Ning Ding
Bowen Zhou
187
2
0
21 Oct 2025
AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM
AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM
Haoyu Huang
Hong Ting Tsang
Jiaxin Bai
Xi Peng
Gong Zhang
Yangqiu Song
RALMSLR
245
1
0
20 Oct 2025
Contextual Attention Modulation: Towards Efficient Multi-Task Adaptation in Large Language Models
Contextual Attention Modulation: Towards Efficient Multi-Task Adaptation in Large Language Models
Dayan Pan
Zhaoyang Fu
Jingyuan Wang
Xiao Han
Yue Zhu
Xiangyu Zhao
KELMCLL
159
1
0
20 Oct 2025
Layer Specialization Underlying Compositional Reasoning in Transformers
Layer Specialization Underlying Compositional Reasoning in Transformers
Jing Liu
LRM
195
0
0
20 Oct 2025
Atomic Literary Styling: Mechanistic Manipulation of Prose Generation in Neural Language Models
Atomic Literary Styling: Mechanistic Manipulation of Prose Generation in Neural Language Models
Tsogt-Ochir Enkhbayar
168
0
0
19 Oct 2025
Soundness-Aware Level: A Microscopic Signature that Predicts LLM Reasoning Potential
Soundness-Aware Level: A Microscopic Signature that Predicts LLM Reasoning Potential
Xuansheng Wu
Xiaoman Pan
Wenlin Yao
Jianshu Chen
ReLMLRM
199
0
0
17 Oct 2025
Facts in Stats: Impacts of Pretraining Diversity on Language Model Generalization
Facts in Stats: Impacts of Pretraining Diversity on Language Model Generalization
Tina Behnia
Puneesh Deora
Christos Thrampoulidis
152
0
0
17 Oct 2025
Emergence of Linear Truth Encodings in Language Models
Emergence of Linear Truth Encodings in Language Models
Shauli Ravfogel
Gilad Yehudai
Tal Linzen
Joan Bruna
A. Bietti
KELM
191
5
0
17 Oct 2025
Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models
Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models
Daniil Gurgurov
Josef van Genabith
Simon Ostermann
Simon Ostermann
MoE
258
0
0
15 Oct 2025
Simple Projection Variants Improve ColBERT Performance
Simple Projection Variants Improve ColBERT Performance
Benjamin Clavié
Sean Lee
Rikiya Takehi
Aamir Shakir
Makoto P. Kato
202
2
0
14 Oct 2025
STEAM: A Semantic-Level Knowledge Editing Framework for Large Language Models
STEAM: A Semantic-Level Knowledge Editing Framework for Large Language Models
Geunyeong Jeong
Juoh Sun
Seonghee Lee
Harksoo Kim
KELM
186
0
0
12 Oct 2025
Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation
Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation
Jiaye Li
Baoyou Chen
Hui Li
Zilong Dong
Jingdong Wang
Siyu Zhu
142
0
0
12 Oct 2025
EvoEdit: Evolving Null-space Alignment for Robust and Efficient Knowledge Editing
EvoEdit: Evolving Null-space Alignment for Robust and Efficient Knowledge Editing
Sicheng Lyu
Yu Gu
Xinyu Wang
Jerry Huang
Sitao Luan
Yufei Cui
Xiao-Wen Chang
Peng Lu
KELM
119
2
0
11 Oct 2025
Utilizing dynamic sparsity on pretrained DETR
Utilizing dynamic sparsity on pretrained DETR
Reza Sedghi
Anand Subramoney
David Kappel
MoE
171
1
0
10 Oct 2025
On the Representations of Entities in Auto-regressive Large Language Models
On the Representations of Entities in Auto-regressive Large Language Models
Victor Morand
Josiane Mothe
Benjamin Piwowarski
154
0
0
10 Oct 2025
Understanding the Effects of Domain Finetuning on LLMs
Understanding the Effects of Domain Finetuning on LLMs
Eshaan Tanwar
Deepak Nathani
William Yang Wang
Tanmoy Chakraborty
167
1
0
10 Oct 2025
1234...141516
Next
Page 1 of 16
Pageof 16