ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.14913
  4. Cited By
Transformer Feed-Forward Layers Are Key-Value Memories

Transformer Feed-Forward Layers Are Key-Value Memories

29 December 2020
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
    KELM
ArXivPDFHTML

Papers citing "Transformer Feed-Forward Layers Are Key-Value Memories"

50 / 151 papers shown
Title
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bo Chen
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
96
19
0
15 Oct 2024
Dissecting Fine-Tuning Unlearning in Large Language Models
Dissecting Fine-Tuning Unlearning in Large Language Models
Yihuai Hong
Yuelin Zou
Lijie Hu
Ziqian Zeng
Di Wang
Haiqin Yang
AAML
MU
39
2
0
09 Oct 2024
Temporal Reasoning Transfer from Text to Video
Temporal Reasoning Transfer from Text to Video
Lei Li
Yuanxin Liu
Linli Yao
Peiyuan Zhang
Chenxin An
Lean Wang
Xu Sun
Lingpeng Kong
Qi Liu
LRM
48
7
0
08 Oct 2024
From Tokens to Words: On the Inner Lexicon of LLMs
From Tokens to Words: On the Inner Lexicon of LLMs
Guy Kaplan
Matanel Oren
Yuval Reif
Roy Schwartz
48
12
0
08 Oct 2024
Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models
Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models
Xin Zou
Yizhou Wang
Yibo Yan
Yuanhuiyi Lyu
Kening Zheng
...
Junkai Chen
Peijie Jiang
Jiaheng Liu
Chang Tang
Xuming Hu
86
7
0
04 Oct 2024
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
Yan Scholten
Stephan Günnemann
Leo Schwinn
MU
55
6
0
04 Oct 2024
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models
Fan Zhang
Houcheng Jiang
Kun Wang
Yunshan Ma
Shi Jie
Xiangnan He
Tat-Seng Chua
Tat-seng Chua
KELM
39
33
0
03 Oct 2024
Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition
Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition
Jiyeon Kim
Hyunji Lee
Hyowon Cho
Joel Jang
Hyeonbin Hwang
Seungpil Won
Youbin Ahn
Dohaeng Lee
Minjoon Seo
KELM
99
3
0
02 Oct 2024
Interpreting Arithmetic Mechanism in Large Language Models through
  Comparative Neuron Analysis
Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
Zeping Yu
Sophia Ananiadou
LRM
MILM
27
6
0
21 Sep 2024
Extracting Paragraphs from LLM Token Activations
Extracting Paragraphs from LLM Token Activations
Nicholas Pochinkov
Angelo Benoit
Lovkush Agarwal
Zainab Ali Majid
Lucile Ter-Minassian
32
1
0
10 Sep 2024
From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning
From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning
Wei Chen
Zhen Huang
Liang Xie
Binbin Lin
Houqiang Li
...
Deng Cai
Yonggang Zhang
Wenxiao Wang
Xu Shen
Jieping Ye
51
6
0
03 Sep 2024
Knowledge in Superposition: Unveiling the Failures of Lifelong Knowledge Editing for Large Language Models
Knowledge in Superposition: Unveiling the Failures of Lifelong Knowledge Editing for Large Language Models
Chenhui Hu
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
KELM
61
2
0
14 Aug 2024
Bridging LLMs and KGs without Fine-Tuning: Intermediate Probing Meets Subgraph-Aware Entity Descriptions
Bridging LLMs and KGs without Fine-Tuning: Intermediate Probing Meets Subgraph-Aware Entity Descriptions
Bo Xue
Yi Xu
Yunchong Song
Yiming Pang
Yuyang Ren
Jiaxin Ding
Luoyi Fu
Xinbing Wang
OffRL
49
1
0
13 Aug 2024
Layerwise Recurrent Router for Mixture-of-Experts
Layerwise Recurrent Router for Mixture-of-Experts
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
81
2
0
13 Aug 2024
To Forget or Not? Towards Practical Knowledge Unlearning for Large
  Language Models
To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models
Bozhong Tian
Xiaozhuan Liang
Siyuan Cheng
Qingbin Liu
Mengru Wang
Dianbo Sui
Xi Chen
Huajun Chen
Ningyu Zhang
MU
27
6
0
02 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
82
19
0
02 Jul 2024
Memorizing Documents with Guidance in Large Language Models
Memorizing Documents with Guidance in Large Language Models
Bumjin Park
Jaesik Choi
KELM
RALM
36
1
0
23 Jun 2024
Beyond Individual Facts: Investigating Categorical Knowledge Locality of
  Taxonomy and Meronomy Concepts in GPT Models
Beyond Individual Facts: Investigating Categorical Knowledge Locality of Taxonomy and Meronomy Concepts in GPT Models
Christopher Burger
Yifan Hu
Thai Le
KELM
39
0
0
22 Jun 2024
How Do Large Language Models Acquire Factual Knowledge During
  Pretraining?
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Hoyeon Chang
Jinho Park
Seonghyeon Ye
Sohee Yang
Youngkyung Seo
Du-Seong Chang
Minjoon Seo
KELM
37
32
0
17 Jun 2024
MEMLA: Enhancing Multilingual Knowledge Editing with Neuron-Masked
  Low-Rank Adaptation
MEMLA: Enhancing Multilingual Knowledge Editing with Neuron-Masked Low-Rank Adaptation
Jiakuan Xie
Pengfei Cao
Yuheng Chen
Yubo Chen
Kang Liu
Jun Zhao
KELM
39
3
0
17 Jun 2024
In-Context Editing: Learning Knowledge from Self-Induced Distributions
In-Context Editing: Learning Knowledge from Self-Induced Distributions
Siyuan Qi
Bangcheng Yang
Kailin Jiang
Xiaobo Wang
Jiaqi Li
Yifan Zhong
Yaodong Yang
Zilong Zheng
KELM
106
8
0
17 Jun 2024
DIEKAE: Difference Injection for Efficient Knowledge Augmentation and
  Editing of Large Language Models
DIEKAE: Difference Injection for Efficient Knowledge Augmentation and Editing of Large Language Models
Alessio Galatolo
Meriem Beloucif
Katie Winkle
35
0
0
15 Jun 2024
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Tomer Ashuach
Martin Tutek
Yonatan Belinkov
KELM
MU
71
4
0
13 Jun 2024
Interpreting the Second-Order Effects of Neurons in CLIP
Interpreting the Second-Order Effects of Neurons in CLIP
Yossi Gandelsman
Alexei A. Efros
Jacob Steinhardt
MILM
59
16
0
06 Jun 2024
Evaluating the External and Parametric Knowledge Fusion of Large
  Language Models
Evaluating the External and Parametric Knowledge Fusion of Large Language Models
Hao Zhang
Yuyang Zhang
Xiaoguang Li
Wenxuan Shi
Haonan Xu
...
Yasheng Wang
Lifeng Shang
Qun Liu
Yong-jin Liu
Ruiming Tang
KELM
41
4
0
29 May 2024
Wavelet-Based Image Tokenizer for Vision Transformers
Wavelet-Based Image Tokenizer for Vision Transformers
Zhenhai Zhu
Radu Soricut
ViT
50
3
0
28 May 2024
Knowledge Circuits in Pretrained Transformers
Knowledge Circuits in Pretrained Transformers
Yunzhi Yao
Ningyu Zhang
Zekun Xi
Meng Wang
Ziwen Xu
Shumin Deng
Huajun Chen
KELM
64
20
0
28 May 2024
Perturbation-Restrained Sequential Model Editing
Perturbation-Restrained Sequential Model Editing
Junjie Ma
Hong Wang
Haoyang Xu
Zhen-Hua Ling
Jia-Chen Gu
KELM
59
8
0
27 May 2024
Large Scale Knowledge Washing
Large Scale Knowledge Washing
Yu-Xiang Wang
Ruihan Wu
Zexue He
Xinyu Chen
Julian McAuley
MU
KELM
77
5
0
26 May 2024
Sparse Matrix in Large Language Model Fine-tuning
Sparse Matrix in Large Language Model Fine-tuning
Haoze He
Juncheng Billy Li
Xuan Jiang
Heather Miller
MoE
27
3
0
24 May 2024
Implicit In-context Learning
Implicit In-context Learning
Zhuowei Li
Zihao Xu
Ligong Han
Yunhe Gao
Song Wen
Di Liu
Hao Wang
Dimitris N. Metaxas
38
1
0
23 May 2024
Beyond Scaling Laws: Understanding Transformer Performance with
  Associative Memory
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Xueyan Niu
Bo Bai
Lei Deng
Wei Han
36
6
0
14 May 2024
What does the Knowledge Neuron Thesis Have to do with Knowledge?
What does the Knowledge Neuron Thesis Have to do with Knowledge?
Jingcheng Niu
Andrew Liu
Zining Zhu
Gerald Penn
48
31
0
03 May 2024
EfficientASR: Speech Recognition Network Compression via Attention
  Redundancy and Chunk-Level FFN Optimization
EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization
Jianzong Wang
Ziqi Liang
Xulong Zhang
Ning Cheng
Jing Xiao
38
0
0
30 Apr 2024
CRE-LLM: A Domain-Specific Chinese Relation Extraction Framework with
  Fine-tuned Large Language Model
CRE-LLM: A Domain-Specific Chinese Relation Extraction Framework with Fine-tuned Large Language Model
Zhengpeng Shi
Haoran Luo
LRM
ALM
38
2
0
28 Apr 2024
Large language models and linguistic intentionality
Large language models and linguistic intentionality
J. Grindrod
38
5
0
15 Apr 2024
Mixture of Low-rank Experts for Transferable AI-Generated Image
  Detection
Mixture of Low-rank Experts for Transferable AI-Generated Image Detection
Zihan Liu
Hanyi Wang
Yaoyu Kang
Shilin Wang
MoE
41
12
0
07 Apr 2024
Dissecting Query-Key Interaction in Vision Transformers
Dissecting Query-Key Interaction in Vision Transformers
Xu Pan
Aaron Philip
Ziqian Xie
Odelia Schwartz
39
1
0
04 Apr 2024
Personalized LLM Response Generation with Parameterized Memory Injection
Personalized LLM Response Generation with Parameterized Memory Injection
Kai Zhang
Lizhi Qing
Yangyang Kang
36
11
0
04 Apr 2024
"My agent understands me better": Integrating Dynamic Human-like Memory
  Recall and Consolidation in LLM-Based Agents
"My agent understands me better": Integrating Dynamic Human-like Memory Recall and Consolidation in LLM-Based Agents
Yuki Hou
Haruki Tamoto
Homei Miyashita
LLMAG
25
23
0
31 Mar 2024
Tracing the Roots of Facts in Multilingual Language Models: Independent,
  Shared, and Transferred Knowledge
Tracing the Roots of Facts in Multilingual Language Models: Independent, Shared, and Transferred Knowledge
Xin Zhao
Naoki Yoshinaga
Daisuke Oba
KELM
HILM
34
10
0
08 Mar 2024
Data-free Weight Compress and Denoise for Large Language Models
Data-free Weight Compress and Denoise for Large Language Models
Runyu Peng
Yunhua Zhou
Qipeng Guo
Yang Gao
Hang Yan
Xipeng Qiu
Dahua Lin
39
1
0
26 Feb 2024
Cracking Factual Knowledge: A Comprehensive Analysis of Degenerate
  Knowledge Neurons in Large Language Models
Cracking Factual Knowledge: A Comprehensive Analysis of Degenerate Knowledge Neurons in Large Language Models
Yuheng Chen
Pengfei Cao
Yubo Chen
Yining Wang
Shengping Liu
Kang Liu
Jun Zhao
KELM
37
1
0
21 Feb 2024
SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering
SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering
Xiaopeng Li
Shasha Li
Shezheng Song
Huijun Liu
Bing Ji
...
Jun Ma
Jie Yu
Xiaodong Liu
Jing Wang
Weimin Zhang
KELM
45
4
0
31 Jan 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
34
78
0
25 Jan 2024
The Truth is in There: Improving Reasoning in Language Models with
  Layer-Selective Rank Reduction
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Pratyusha Sharma
Jordan T. Ash
Dipendra Kumar Misra
LRM
19
78
0
21 Dec 2023
MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA
MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA
Lang Yu
Qin Chen
Jie Zhou
Liang He
KELM
17
45
0
19 Dec 2023
A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia
A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia
Giovanni Monea
Maxime Peyrard
Martin Josifoski
Vishrav Chaudhary
Jason Eisner
Emre Kiciman
Hamid Palangi
Barun Patra
Robert West
KELM
51
12
0
04 Dec 2023
Hyperpolyglot LLMs: Cross-Lingual Interpretability in Token Embeddings
Hyperpolyglot LLMs: Cross-Lingual Interpretability in Token Embeddings
Andrea W Wen-Yi
David Mimno
30
14
0
29 Nov 2023
Assessing Knowledge Editing in Language Models via Relation Perspective
Assessing Knowledge Editing in Language Models via Relation Perspective
Yifan Wei
Xiaoyan Yu
Huanhuan Ma
Fangyu Lei
Yixuan Weng
Ran Song
Kang Liu
KELM
36
15
0
15 Nov 2023
Previous
1234
Next