Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.02098
Cited By
Self-attention Networks Localize When QK-eigenspectrum Concentrates
3 February 2024
Han Bao
Ryuichiro Hataya
Ryo Karakida
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Self-attention Networks Localize When QK-eigenspectrum Concentrates"
12 / 12 papers shown
Title
Spike No More: Stabilizing the Pre-training of Large Language Models
Sho Takase
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
42
15
0
28 Dec 2023
Max-Margin Token Selection in Attention Mechanism
Davoud Ataee Tarzanagh
Yingcong Li
Xuechen Zhang
Samet Oymak
58
42
0
23 Jun 2023
Birth of a Transformer: A Memory Viewpoint
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
72
93
0
01 Jun 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
44
75
0
25 May 2023
Locating and Editing Factual Associations in GPT
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
185
1,330
0
10 Feb 2022
An Explanation of In-context Learning as Implicit Bayesian Inference
Sang Michael Xie
Aditi Raghunathan
Percy Liang
Tengyu Ma
ReLM
BDL
VPVLM
LRM
162
746
0
03 Nov 2021
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
91
383
0
05 Mar 2021
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
339
6,728
0
23 Dec 2020
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
110
988
0
12 Feb 2020
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
252
2,842
0
26 Sep 2016
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
326
10,464
0
21 Jul 2016
Generating Sequences With Recurrent Neural Networks
Alex Graves
GAN
129
4,031
0
04 Aug 2013
1