v1v2 (latest)

Transformer Feed-Forward Layers Are Key-Value Memories

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020

29 December 2020

Papers citing "Transformer Feed-Forward Layers Are Key-Value Memories"

50 / 790 papers shown

Learning without training: The implicit dynamics of in-context learning

803

24 Dec 2025

EtCon: Edit-then-Consolidate for Reliable Knowledge Editing

207

04 Dec 2025

Representation Interventions Enable Lifelong Knowledge Memory Control in LLMs

241

25 Nov 2025

CrossEarth-Gate: Fisher-Guided Adaptive Tuning Engine for Efficient Adaptation of Cross-Domain Remote Sensing Semantic Segmentation

...

303

25 Nov 2025

Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model

567

25 Nov 2025

Bridging Philosophy and Machine Learning: A Structuralist Framework for Classifying Neural Network Representations

Yildiz Culcu

AI4CE

230

23 Nov 2025

Exploiting the Experts: Unauthorized Compression in MoE-LLMs

Pinaki Prasad Guha Neogi

Ahmad Mohammadshirazi

Dheeraj Kulshrestha

R. Ramnath

MoE

191

22 Nov 2025

RoSA: Enhancing Parameter-Efficient Fine-Tuning via RoPE-aware Selective Adaptation in Large Language Models

21 Nov 2025

Beyond Tokens in Language Models: Interpreting Activations through Text Genre Chunks

Éloïse Benito-Rodriguez

Einar Urdshals

Jasmina Nasufi

Nicky Pochinkov

147

20 Nov 2025

Adaptive Focus Memory for Language Models

Christopher Cruz

KELM

305

16 Nov 2025

Belief Net: A Filter-Based Framework for Learning Hidden Markov Models from Observations

Reginald Zhiyan Chen

Heng-Sheng Chang

P. Mehta

115

13 Nov 2025

Beyond Superficial Forgetting: Thorough Unlearning through Knowledge Density Estimation and Block Re-insertion

497

11 Nov 2025

On the Analogy between Human Brain and LLMs: Spotting Key Neurons in Grammar Perception

Sanaz Saki Norouzi

Mohammad Masjedi

Pascal Hitzler

169

09 Nov 2025

You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations

203

09 Nov 2025

Catching Contamination Before Generation: Spectral Kill Switches for Agents

Valentin Noël

145

08 Nov 2025

Understanding Robustness of Model Editing in Code LLMs: An Empirical Study

214

05 Nov 2025

ExplicitLM: Decoupling Knowledge from Parameters via Explicit Memory Banks

192

03 Nov 2025

Balancing Knowledge Updates: Toward Unified Modular Editing in LLMs

188

31 Oct 2025

The Structure of Relation Decoding Linear Operators in Large Language Models

174

30 Oct 2025

Understanding and Enhancing Mamba-Transformer Hybrids for Memory Recall and Language Modeling

262

30 Oct 2025

A Survey on Unlearning in Large Language Models

789

29 Oct 2025

MemEIC: A Step Toward Continual and Compositional Knowledge Editing

369

29 Oct 2025

From Memorization to Reasoning in the Spectrum of Loss Curvature

269

28 Oct 2025

Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank

111

28 Oct 2025

Emergence of Minimal Circuits for Indirect Object Identification in Attention-Only Transformers

Rabin Adhikari

LRM

101

28 Oct 2025

From Uniform to Adaptive: General Skip-Block Mechanisms for Efficient PDE Neural Operators

230

27 Oct 2025

Probing Neural Combinatorial Optimization Models

148

25 Oct 2025

Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs

430

25 Oct 2025

Large Language Models as Model Organisms for Human Associative Learning

251

24 Oct 2025

Model-Aware Tokenizer Transfer

Mykola Haltiuk

Aleksander Smywiński-Pohl

166

24 Oct 2025

Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples

154

23 Oct 2025

Restoring Pruned Large Language Models via Lost Component Compensation

212

22 Oct 2025

Fairness Evaluation and Inference Level Mitigation in LLMs

200

21 Oct 2025

How Do LLMs Use Their Depth?

Akshat Gupta

Jay Yeung

Gopala Anumanchipalli

Anna Ivanova

145

21 Oct 2025

DePass: Unified Feature Attributing by Simple Decomposed Forward Pass

187

21 Oct 2025

AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM

245

20 Oct 2025

Contextual Attention Modulation: Towards Efficient Multi-Task Adaptation in Large Language Models

159

20 Oct 2025

Layer Specialization Underlying Compositional Reasoning in Transformers

Jing Liu

LRM

195

20 Oct 2025

Atomic Literary Styling: Mechanistic Manipulation of Prose Generation in Neural Language Models

Tsogt-Ochir Enkhbayar

168

19 Oct 2025

Soundness-Aware Level: A Microscopic Signature that Predicts LLM Reasoning Potential

199

17 Oct 2025

Facts in Stats: Impacts of Pretraining Diversity on Language Model Generalization

Tina Behnia

Puneesh Deora

Christos Thrampoulidis

152

17 Oct 2025

Emergence of Linear Truth Encodings in Language Models

191

17 Oct 2025

Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models

258

15 Oct 2025

Simple Projection Variants Improve ColBERT Performance

202

14 Oct 2025

STEAM: A Semantic-Level Knowledge Editing Framework for Large Language Models

186

12 Oct 2025

Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation

142

12 Oct 2025

EvoEdit: Evolving Null-space Alignment for Robust and Efficient Knowledge Editing

119

11 Oct 2025

Utilizing dynamic sparsity on pretrained DETR

171

10 Oct 2025

On the Representations of Entities in Auto-regressive Large Language Models

Victor Morand

Josiane Mothe

Benjamin Piwowarski

154

10 Oct 2025

Understanding the Effects of Domain Finetuning on LLMs

167

10 Oct 2025