Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2003.05997
Cited By
Efficient Content-Based Sparse Attention with Routing Transformers
12 March 2020
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficient Content-Based Sparse Attention with Routing Transformers"
37 / 137 papers shown
Title
Do Long-Range Language Models Actually Use Long-Range Context?
Simeng Sun
Kalpesh Krishna
Andrew Mattarella-Micke
Mohit Iyyer
RALM
25
80
0
19 Sep 2021
GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer
Shuaicheng Li
Qianggang Cao
Lingbo Liu
Kunlin Yang
Shinan Liu
Jun Hou
Shuai Yi
ViT
39
103
0
28 Aug 2021
Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation
Samuel Cahyawijaya
26
12
0
24 Aug 2021
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao
Suvinay Subramanian
Gaurav Agrawal
Amir Yazdanbakhsh
T. Krishna
38
57
0
13 Jul 2021
Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
78
77
0
12 Jul 2021
Learned Token Pruning for Transformers
Sehoon Kim
Sheng Shen
D. Thorsley
A. Gholami
Woosuk Kwon
Joseph Hassoun
Kurt Keutzer
14
145
0
02 Jul 2021
Variational Diffusion Models
Diederik P. Kingma
Tim Salimans
Ben Poole
Jonathan Ho
DiffM
70
1,060
0
01 Jul 2021
IA-RED
2
^2
2
: Interpretability-Aware Redundancy Reduction for Vision Transformers
Bowen Pan
Rameswar Panda
Yi Ding
Zhangyang Wang
Rogerio Feris
A. Oliva
VLM
ViT
39
153
0
23 Jun 2021
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
40
815
0
14 Jun 2021
Thinking Like Transformers
Gail Weiss
Yoav Goldberg
Eran Yahav
AI4CE
35
127
0
13 Jun 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
50
1,088
0
08 Jun 2021
Learning to Efficiently Sample from Diffusion Probabilistic Models
Daniel Watson
Jonathan Ho
Mohammad Norouzi
William Chan
DiffM
45
134
0
07 Jun 2021
On the Expressive Power of Self-Attention Matrices
Valerii Likhosherstov
K. Choromanski
Adrian Weller
37
34
0
07 Jun 2021
Learning Slice-Aware Representations with Mixture of Attentions
Cheng Wang
Sungjin Lee
Sunghyun Park
Han Li
Young-Bum Kim
R. Sarikaya
29
2
0
04 Jun 2021
KVT: k-NN Attention for Boosting Vision Transformers
Pichao Wang
Xue Wang
F. Wang
Ming Lin
Shuning Chang
Hao Li
R. L. Jin
ViT
51
105
0
28 May 2021
Relative Positional Encoding for Transformers with Linear Complexity
Antoine Liutkus
Ondřej Cífka
Shih-Lun Wu
Umut Simsekli
Yi-Hsuan Yang
Gaël Richard
33
44
0
18 May 2021
Poolingformer: Long Document Modeling with Pooling Attention
Hang Zhang
Yeyun Gong
Yelong Shen
Weisheng Li
Jiancheng Lv
Nan Duan
Weizhu Chen
37
98
0
10 May 2021
T-EMDE: Sketching-based global similarity for cross-modal retrieval
Barbara Rychalska
Mikolaj Wieczorek
Jacek Dąbrowski
33
0
0
10 May 2021
FNet: Mixing Tokens with Fourier Transforms
James Lee-Thorp
Joshua Ainslie
Ilya Eckstein
Santiago Ontanon
26
517
0
09 May 2021
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
Pengchuan Zhang
Xiyang Dai
Jianwei Yang
Bin Xiao
Lu Yuan
Lei Zhang
Jianfeng Gao
ViT
29
329
0
29 Mar 2021
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Ashish Vaswani
Prajit Ramachandran
A. Srinivas
Niki Parmar
Blake A. Hechtman
Jonathon Shlens
18
395
0
23 Mar 2021
Generating Images with Sparse Representations
C. Nash
Jacob Menick
Sander Dieleman
Peter W. Battaglia
33
199
0
05 Mar 2021
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
85
973
0
04 Mar 2021
Improved Denoising Diffusion Probabilistic Models
Alex Nichol
Prafulla Dhariwal
DiffM
60
3,526
0
18 Feb 2021
Shortformer: Better Language Modeling using Shorter Inputs
Ofir Press
Noah A. Smith
M. Lewis
230
89
0
31 Dec 2020
Open Question Answering over Tables and Text
Wenhu Chen
Ming-Wei Chang
Eva Schlinger
Luu Anh Tuan
William W. Cohen
LMTD
RALM
31
194
0
20 Oct 2020
SMYRF: Efficient Attention using Asymmetric Clustering
Giannis Daras
Nikita Kitaev
Augustus Odena
A. Dimakis
28
44
0
11 Oct 2020
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
31
1,521
0
30 Sep 2020
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
106
1,102
0
14 Sep 2020
Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding
Shuohang Wang
Luowei Zhou
Zhe Gan
Yen-Chun Chen
Yuwei Fang
S. Sun
Yu Cheng
Jingjing Liu
43
28
0
13 Sep 2020
Sparsifying Transformer Models with Trainable Representation Pooling
Michal Pietruszka
Łukasz Borchmann
Lukasz Garncarek
17
10
0
10 Sep 2020
Conformer-Kernel with Query Term Independence for Document Retrieval
Bhaskar Mitra
Sebastian Hofstatter
Hamed Zamani
Nick Craswell
19
21
0
20 Jul 2020
Sparse GPU Kernels for Deep Learning
Trevor Gale
Matei A. Zaharia
C. Young
Erich Elsen
17
227
0
18 Jun 2020
Input-independent Attention Weights Are Expressive Enough: A Study of Attention in Self-supervised Audio Transformers
Tsung-Han Wu
Chun-Chen Hsieh
Yen-Hao Chen
Po-Han Chi
Hung-yi Lee
26
1
0
09 Jun 2020
Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching
Liu Yang
Mingyang Zhang
Cheng Li
Michael Bendersky
Marc Najork
27
86
0
26 Apr 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
28
3,916
0
10 Apr 2020
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
218
7,926
0
17 Aug 2015
Previous
1
2
3