Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.05768
Cited By
Combiner: Full Attention Transformer with Sparse Computation Cost
12 July 2021
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Combiner: Full Attention Transformer with Sparse Computation Cost"
6 / 56 papers shown
Title
Q-ViT: Fully Differentiable Quantization for Vision Transformer
Zhexin Li
Tong Yang
Peisong Wang
Jian Cheng
ViT
MQ
33
41
0
19 Jan 2022
Self-attention Does Not Need
O
(
n
2
)
O(n^2)
O
(
n
2
)
Memory
M. Rabe
Charles Staats
LRM
26
139
0
10 Dec 2021
Hierarchical Transformers Are More Efficient Language Models
Piotr Nawrot
Szymon Tworkowski
Michał Tyrolski
Lukasz Kaiser
Yuhuai Wu
Christian Szegedy
Henryk Michalewski
22
60
0
26 Oct 2021
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
285
2,017
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
252
580
0
12 Mar 2020
Pixel Recurrent Neural Networks
Aaron van den Oord
Nal Kalchbrenner
Koray Kavukcuoglu
SSeg
GAN
251
2,550
0
25 Jan 2016
Previous
1
2