ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.11775
  4. Cited By
Transformer Dissection: A Unified Understanding of Transformer's
  Attention via the Lens of Kernel

Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel

30 August 2019
Yao-Hung Hubert Tsai
Shaojie Bai
M. Yamada
Louis-Philippe Morency
Ruslan Salakhutdinov
ArXivPDFHTML

Papers citing "Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel"

8 / 58 papers shown
Title
Linear Transformers Are Secretly Fast Weight Programmers
Linear Transformers Are Secretly Fast Weight Programmers
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
51
228
0
22 Feb 2021
LieTransformer: Equivariant self-attention for Lie Groups
LieTransformer: Equivariant self-attention for Lie Groups
M. Hutchinson
Charline Le Lan
Sheheryar Zaidi
Emilien Dupont
Yee Whye Teh
Hyunjik Kim
31
111
0
20 Dec 2020
Rethinking Attention with Performers
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
82
1,527
0
30 Sep 2020
On the Computational Power of Transformers and its Implications in
  Sequence Modeling
On the Computational Power of Transformers and its Implications in Sequence Modeling
S. Bhattamishra
Arkil Patel
Navin Goyal
33
66
0
16 Jun 2020
The Lipschitz Constant of Self-Attention
The Lipschitz Constant of Self-Attention
Hyunjik Kim
George Papamakarios
A. Mnih
16
135
0
08 Jun 2020
Masked Language Modeling for Proteins via Linearly Scalable Long-Context
  Transformers
Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Peter Hawkins
Jared Davis
David Belanger
Lucy J. Colwell
Adrian Weller
44
84
0
05 Jun 2020
Kernel Self-Attention in Deep Multiple Instance Learning
Kernel Self-Attention in Deep Multiple Instance Learning
Dawid Rymarczyk
Adriana Borowa
Jacek Tabor
Bartosz Zieliñski
SSL
14
5
0
25 May 2020
Classical Structured Prediction Losses for Sequence to Sequence Learning
Classical Structured Prediction Losses for Sequence to Sequence Learning
Sergey Edunov
Myle Ott
Michael Auli
David Grangier
MarcÁurelio Ranzato
AIMat
56
185
0
14 Nov 2017
Previous
12