Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.11775
Cited By
Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel
30 August 2019
Yao-Hung Hubert Tsai
Shaojie Bai
M. Yamada
Louis-Philippe Morency
Ruslan Salakhutdinov
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel"
8 / 58 papers shown
Title
Linear Transformers Are Secretly Fast Weight Programmers
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
51
228
0
22 Feb 2021
LieTransformer: Equivariant self-attention for Lie Groups
M. Hutchinson
Charline Le Lan
Sheheryar Zaidi
Emilien Dupont
Yee Whye Teh
Hyunjik Kim
31
111
0
20 Dec 2020
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
82
1,527
0
30 Sep 2020
On the Computational Power of Transformers and its Implications in Sequence Modeling
S. Bhattamishra
Arkil Patel
Navin Goyal
33
66
0
16 Jun 2020
The Lipschitz Constant of Self-Attention
Hyunjik Kim
George Papamakarios
A. Mnih
16
135
0
08 Jun 2020
Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Peter Hawkins
Jared Davis
David Belanger
Lucy J. Colwell
Adrian Weller
44
84
0
05 Jun 2020
Kernel Self-Attention in Deep Multiple Instance Learning
Dawid Rymarczyk
Adriana Borowa
Jacek Tabor
Bartosz Zieliñski
SSL
14
5
0
25 May 2020
Classical Structured Prediction Losses for Sequence to Sequence Learning
Sergey Edunov
Myle Ott
Michael Auli
David Grangier
MarcÁurelio Ranzato
AIMat
56
185
0
14 Nov 2017
Previous
1
2