Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.11775
Cited By
Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel
30 August 2019
Yao-Hung Hubert Tsai
Shaojie Bai
M. Yamada
Louis-Philippe Morency
Ruslan Salakhutdinov
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel"
50 / 53 papers shown
Title
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Josh Alman
Zhao Song
27
12
0
17 May 2025
A Reproduction Study: The Kernel PCA Interpretation of Self-Attention Fails Under Scrutiny
Karahan Sarıtaş
Çağatay Yıldız
34
0
0
12 May 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Arvid Frydenlund
LRM
63
0
0
13 Mar 2025
Transformer Meets Twicing: Harnessing Unattended Residual Information
Laziz U. Abdullaev
Tan M. Nguyen
43
2
0
02 Mar 2025
Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
Yang Cao
Zhao Song
Chiwun Yang
VGen
55
2
0
01 Feb 2025
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
96
9
0
11 Jan 2025
Key-value memory in the brain
Samuel J. Gershman
Ila Fiete
Kazuki Irie
34
7
0
06 Jan 2025
Fast Gradient Computation for RoPE Attention in Almost Linear Time
Yifang Chen
Jiayan Huo
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
71
12
0
03 Jan 2025
Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass
Tong Chen
Hao Fang
Patrick Xia
Xiaodong Liu
Benjamin Van Durme
Luke Zettlemoyer
Jianfeng Gao
Hao Cheng
KELM
59
2
0
08 Nov 2024
The Mystery of the Pathological Path-star Task for Language Models
Arvid Frydenlund
LRM
27
4
0
17 Oct 2024
Context-Scaling versus Task-Scaling in In-Context Learning
Amirhesam Abedsoltan
Adityanarayanan Radhakrishnan
Jingfeng Wu
M. Belkin
ReLM
LRM
42
3
0
16 Oct 2024
How Effective are State Space Models for Machine Translation?
Hugo Pitorro
Pavlo Vasylenko
Marcos Vinícius Treviso
André F. T. Martins
Mamba
45
3
0
07 Jul 2024
DiJiang: Efficient Large Language Models through Compact Kernelization
Hanting Chen
Zhicheng Liu
Xutao Wang
Yuchuan Tian
Yunhe Wang
VLM
31
5
0
29 Mar 2024
Data-free Weight Compress and Denoise for Large Language Models
Runyu Peng
Yunhua Zhou
Qipeng Guo
Yang Gao
Hang Yan
Xipeng Qiu
Dahua Lin
47
1
0
26 Feb 2024
Breaking Symmetry When Training Transformers
Chunsheng Zuo
Michael Guerzhoy
30
0
0
06 Feb 2024
DF2: Distribution-Free Decision-Focused Learning
Lingkai Kong
Wenhao Mu
Jiaming Cui
Yuchen Zhuang
B. Prakash
Bo Dai
Chao Zhang
OffRL
44
1
0
11 Aug 2023
Inductive biases in deep learning models for weather prediction
Jannik Thümmel
Matthias Karlbauer
S. Otte
C. Zarfl
Georg Martius
...
Thomas Scholten
Ulrich Friedrich
V. Wulfmeyer
B. Goswami
Martin Volker Butz
AI4CE
46
6
0
06 Apr 2023
Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers
K. Choromanski
Shanda Li
Valerii Likhosherstov
Kumar Avinava Dubey
Shengjie Luo
Di He
Yiming Yang
Tamás Sarlós
Thomas Weingarten
Adrian Weller
39
8
0
03 Feb 2023
Infomaxformer: Maximum Entropy Transformer for Long Time-Series Forecasting Problem
Peiwang Tang
Xianchao Zhang
AI4TS
45
3
0
04 Jan 2023
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models
Yufeng Zhang
Boyi Liu
Qi Cai
Lingxiao Wang
Zhaoran Wang
53
11
0
30 Dec 2022
HigeNet: A Highly Efficient Modeling for Long Sequence Time Series Prediction in AIOps
Jiajia Li
Feng Tan
Cheng He
Zikai Wang
Haitao Song
Lingfei Wu
Pengwei Hu
28
0
0
13 Nov 2022
Inducer-tuning: Connecting Prefix-tuning and Adapter-tuning
Yifan Chen
Devamanyu Hazarika
Mahdi Namazifar
Yang Liu
Di Jin
Dilek Z. Hakkani-Tür
29
3
0
26 Oct 2022
Transformer Meets Boundary Value Inverse Problems
Ruchi Guo
Shuhao Cao
Long Chen
MedIm
38
21
0
29 Sep 2022
Features Fusion Framework for Multimodal Irregular Time-series Events
Peiwang Tang
Xianchao Zhang
AI4TS
26
2
0
05 Sep 2022
Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization
T. Nguyen
Richard G. Baraniuk
Robert M. Kirby
Stanley J. Osher
Bao Wang
42
9
0
01 Aug 2022
KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation
Ta-Chung Chi
Ting-Han Fan
Peter J. Ramadge
Alexander I. Rudnicky
49
65
0
20 May 2022
Approximating Permutations with Neural Network Components for Travelling Photographer Problem
S. Chong
12
0
0
30 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
28
6
0
11 Apr 2022
Wasserstein Adversarial Transformer for Cloud Workload Prediction
Shivani Arbat
V. Jayakumar
Jaewoo Lee
Wei Wang
I. Kim
AI4TS
19
22
0
12 Mar 2022
cosFormer: Rethinking Softmax in Attention
Zhen Qin
Weixuan Sun
Huicai Deng
Dongxu Li
Yunshen Wei
Baohong Lv
Junjie Yan
Lingpeng Kong
Yiran Zhong
38
212
0
17 Feb 2022
The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
19
42
0
11 Feb 2022
ETSformer: Exponential Smoothing Transformers for Time-series Forecasting
Gerald Woo
Chenghao Liu
Doyen Sahoo
Akshat Kumar
Guosheng Lin
AI4TS
31
162
0
03 Feb 2022
Learning Operators with Coupled Attention
Georgios Kissas
Jacob H. Seidman
Leonardo Ferreira Guilhoto
V. Preciado
George J. Pappas
P. Perdikaris
32
110
0
04 Jan 2022
Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture
Kieran Wood
Sven Giegerich
Stephen J. Roberts
S. Zohren
AI4TS
AIFin
21
21
0
16 Dec 2021
Heuristic Search Planning with Deep Neural Networks using Imitation, Attention and Curriculum Learning
Leah A. Chrestien
Tomás Pevný
Antonín Komenda
Stefan Edelkamp
14
10
0
03 Dec 2021
Transformers for prompt-level EMA non-response prediction
Supriya Nagesh
Alexander Moreno
Stephanie M Carpenter
Jamie Yap
Soujanya Chatterjee
...
Santosh Kumar
Cho Lam
D. Wetter
Inbal Nahum-Shani
James M. Rehg
19
0
0
01 Nov 2021
Ultra-high Resolution Image Segmentation via Locality-aware Context Fusion and Alternating Local Enhancement
Wenxi Liu
Qi Li
Xin Lin
Weixiang Yang
Shengfeng He
Yuanlong Yu
29
7
0
06 Sep 2021
STAR: Sparse Transformer-based Action Recognition
Feng Shi
Chonghan Lee
Liang Qiu
Yizhou Zhao
Tianyi Shen
Shivran Muralidhar
Tian Han
Song-Chun Zhu
V. Narayanan
ViT
23
26
0
15 Jul 2021
GraphiT: Encoding Graph Structure in Transformers
Grégoire Mialon
Dexiong Chen
Margot Selosse
Julien Mairal
34
165
0
10 Jun 2021
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
49
1,172
0
09 Jun 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
53
1,089
0
08 Jun 2021
Choose a Transformer: Fourier or Galerkin
Shuhao Cao
42
227
0
31 May 2021
Relative Positional Encoding for Transformers with Linear Complexity
Antoine Liutkus
Ondřej Cífka
Shih-Lun Wu
Umut Simsekli
Yi-Hsuan Yang
Gaël Richard
38
45
0
18 May 2021
Finetuning Pretrained Transformers into RNNs
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
44
63
0
24 Mar 2021
Random Feature Attention
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
36
349
0
03 Mar 2021
Linear Transformers Are Secretly Fast Weight Programmers
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
46
225
0
22 Feb 2021
LieTransformer: Equivariant self-attention for Lie Groups
M. Hutchinson
Charline Le Lan
Sheheryar Zaidi
Emilien Dupont
Yee Whye Teh
Hyunjik Kim
31
111
0
20 Dec 2020
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
66
1,527
0
30 Sep 2020
On the Computational Power of Transformers and its Implications in Sequence Modeling
S. Bhattamishra
Arkil Patel
Navin Goyal
33
65
0
16 Jun 2020
The Lipschitz Constant of Self-Attention
Hyunjik Kim
George Papamakarios
A. Mnih
14
135
0
08 Jun 2020
1
2
Next