Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.04768
Cited By
Linformer: Self-Attention with Linear Complexity
8 June 2020
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Linformer: Self-Attention with Linear Complexity"
50 / 1,050 papers shown
Title
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
52
373
0
05 Mar 2021
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
91
977
0
04 Mar 2021
Random Feature Attention
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
36
349
0
03 Mar 2021
Coordination Among Neural Modules Through a Shared Global Workspace
Anirudh Goyal
Aniket Didolkar
Alex Lamb
Kartikeya Badola
Nan Rosemary Ke
Nasim Rahaman
Jonathan Binas
Charles Blundell
Michael C. Mozer
Yoshua Bengio
154
98
0
01 Mar 2021
OmniNet: Omnidirectional Representations from Transformers
Yi Tay
Mostafa Dehghani
V. Aribandi
Jai Gupta
Philip Pham
Zhen Qin
Dara Bahri
Da-Cheng Juan
Donald Metzler
47
26
0
01 Mar 2021
Automated essay scoring using efficient transformer-based language models
C. Ormerod
Akanksha Malhotra
Amir Jafari
29
30
0
25 Feb 2021
LazyFormer: Self Attention with Lazy Update
Chengxuan Ying
Guolin Ke
Di He
Tie-Yan Liu
25
15
0
25 Feb 2021
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Tao Lei
RALM
VLM
59
47
0
24 Feb 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
316
3,633
0
24 Feb 2021
Centroid Transformers: Learning to Abstract with Attention
Lemeng Wu
Xingchao Liu
Qiang Liu
3DPC
61
28
0
17 Feb 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
281
179
0
17 Feb 2021
Translational Equivariance in Kernelizable Attention
Max Horn
Kumar Shridhar
Elrich Groenewald
Philipp F. M. Baumann
16
7
0
15 Feb 2021
Optimizing Inference Performance of Transformers on CPUs
D. Dice
Alex Kogan
19
15
0
12 Feb 2021
Unlocking Pixels for Reinforcement Learning via Implicit Attention
K. Choromanski
Deepali Jain
Wenhao Yu
Xingyou Song
Jack Parker-Holder
...
Aldo Pacchiano
Anirban Santara
Yunhao Tang
Jie Tan
Adrian Weller
OffRL
33
3
0
08 Feb 2021
Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
Yunyang Xiong
Zhanpeng Zeng
Rudrasis Chakraborty
Mingxing Tan
G. Fung
Yin Li
Vikas Singh
47
508
0
07 Feb 2021
Structured Prediction as Translation between Augmented Natural Languages
Giovanni Paolini
Ben Athiwaratkun
Jason Krone
Jie Ma
Alessandro Achille
Rishita Anubhai
Cicero Nogueira dos Santos
Bing Xiang
Stefano Soatto
25
285
0
14 Jan 2021
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
227
2,434
0
04 Jan 2021
Reservoir Transformers
Sheng Shen
Alexei Baevski
Ari S. Morcos
Kurt Keutzer
Michael Auli
Douwe Kiela
35
17
0
30 Dec 2020
RealFormer: Transformer Likes Residual Attention
Ruining He
Anirudh Ravula
Bhargav Kanagal
Joshua Ainslie
27
108
0
21 Dec 2020
LieTransformer: Equivariant self-attention for Lie Groups
M. Hutchinson
Charline Le Lan
Sheheryar Zaidi
Emilien Dupont
Yee Whye Teh
Hyunjik Kim
31
111
0
20 Dec 2020
Noise-Robust End-to-End Quantum Control using Deep Autoregressive Policy Networks
Jiahao Yao
Paul Köttering
Hans Gundlach
Lin Lin
Marin Bukov
26
14
0
12 Dec 2020
A Singular Value Perspective on Model Robustness
Malhar Jere
Maghav Kumar
F. Koushanfar
AAML
31
6
0
07 Dec 2020
PlueckerNet: Learn to Register 3D Line Reconstructions
Liu Liu
Hongdong Li
Haodong Yao
Ruyi Zha
3DPC
3DV
25
6
0
02 Dec 2020
MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
Huiyu Wang
Yukun Zhu
Hartwig Adam
Alan Yuille
Liang-Chieh Chen
ViT
43
527
0
01 Dec 2020
Metric Transforms and Low Rank Matrices via Representation Theory of the Real Hyperrectangle
Josh Alman
T. Chu
Gary Miller
Shyam Narayanan
Mark Sellke
Zhao Song
6
1
0
23 Nov 2020
Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks
Ileana Rugina
Rumen Dangovski
L. Jing
Preslav Nakov
Marin Soljacic
26
0
0
20 Nov 2020
Unifying Instance and Panoptic Segmentation with Dynamic Rank-1 Convolutions
Hao Chen
Chunhua Shen
Zhi Tian
ISeg
21
2
0
19 Nov 2020
EasyTransfer -- A Simple and Scalable Deep Transfer Learning Platform for NLP Applications
Minghui Qiu
Peng Li
Chengyu Wang
Hanjie Pan
Yaliang Li
...
Jun Yang
Yaliang Li
Jun Huang
Deng Cai
Wei Lin
VLM
SyDa
39
20
0
18 Nov 2020
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Songlin Yang
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
53
696
0
08 Nov 2020
Point Transformer
Nico Engel
Vasileios Belagiannis
Klaus C. J. Dietmayer
3DPC
54
1,947
0
02 Nov 2020
FastFormers: Highly Efficient Transformer Models for Natural Language Understanding
Young Jin Kim
Hany Awadalla
AI4CE
32
42
0
26 Oct 2020
Neural Databases
James Thorne
Majid Yazdani
Marzieh Saeidi
Fabrizio Silvestri
Sebastian Riedel
A. Halevy
NAI
34
9
0
14 Oct 2020
Memformer: A Memory-Augmented Transformer for Sequence Modeling
Qingyang Wu
Zhenzhong Lan
Kun Qian
Jing Gu
A. Geramifard
Zhou Yu
22
49
0
14 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
84
4,940
0
08 Oct 2020
Group Equivariant Stand-Alone Self-Attention For Vision
David W. Romero
Jean-Baptiste Cordonnier
MDE
26
58
0
02 Oct 2020
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
63
1,527
0
30 Sep 2020
Learning Hard Retrieval Decoder Attention for Transformers
Hongfei Xu
Qiuhui Liu
Josef van Genabith
Deyi Xiong
15
1
0
30 Sep 2020
Dissecting Lottery Ticket Transformers: Structural and Behavioral Study of Sparse Neural Machine Translation
Rajiv Movva
Jason Zhao
18
12
0
17 Sep 2020
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
114
1,103
0
14 Sep 2020
Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding
Shuohang Wang
Luowei Zhou
Zhe Gan
Yen-Chun Chen
Yuwei Fang
S. Sun
Yu Cheng
Jingjing Liu
43
28
0
13 Sep 2020
Sparsifying Transformer Models with Trainable Representation Pooling
Michal Pietruszka
Łukasz Borchmann
Lukasz Garncarek
23
10
0
10 Sep 2020
Compression of Deep Learning Models for Text: A Survey
Manish Gupta
Puneet Agrawal
VLM
MedIm
AI4CE
22
115
0
12 Aug 2020
Conformer-Kernel with Query Term Independence for Document Retrieval
Bhaskar Mitra
Sebastian Hofstatter
Hamed Zamani
Nick Craswell
27
21
0
20 Jul 2020
Memory Transformer
Andrey Kravchenko
Yuri Kuratov
Anton Peganov
Grigory V. Sapunov
RALM
23
64
0
20 Jun 2020
Synthesizer: Rethinking Self-Attention in Transformer Models
Yi Tay
Dara Bahri
Donald Metzler
Da-Cheng Juan
Zhe Zhao
Che Zheng
17
332
0
02 May 2020
Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences
Andis Draguns
Emīls Ozoliņš
A. Sostaks
Matiss Apinis
Kārlis Freivalds
19
8
0
06 Apr 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yifan Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
21
198
0
27 Feb 2020
Supervised and Unsupervised Neural Approaches to Text Readability
Matej Martinc
Senja Pollak
Marko Robnik-Šikonja
10
138
0
26 Jul 2019
Faster Neural Network Training with Approximate Tensor Operations
Menachem Adelman
Kfir Y. Levy
Ido Hakimi
M. Silberstein
31
26
0
21 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
304
6,996
0
20 Apr 2018
Previous
1
2
3
...
19
20
21