Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.06097
Cited By
Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding
13 September 2020
Shuohang Wang
Luowei Zhou
Zhe Gan
Yen-Chun Chen
Yuwei Fang
S. Sun
Yu Cheng
Jingjing Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding"
10 / 10 papers shown
Title
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
26
1
0
18 Dec 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
28
15
0
28 Sep 2023
AutoFocusFormer: Image Segmentation off the Grid
Chen Ziwen
K. Patnaik
Shuangfei Zhai
Alvin Wan
Zhile Ren
A. Schwing
Alex Colburn
Li Fuxin
24
9
0
24 Apr 2023
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
Jinchao Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
3DV
43
9
0
14 Oct 2022
Separable Self-attention for Mobile Vision Transformers
Sachin Mehta
Mohammad Rastegari
ViT
MQ
26
251
0
06 Jun 2022
Efficient Long Sequence Encoding via Synchronization
Xiangyang Mou
Mo Yu
Bingsheng Yao
Lifu Huang
38
0
0
15 Mar 2022
Long Document Summarization with Top-down and Bottom-up Inference
Bo Pang
Erik Nijkamp
Wojciech Kry'sciñski
Silvio Savarese
Yingbo Zhou
Caiming Xiong
RALM
BDL
21
55
0
15 Mar 2022
Poolingformer: Long Document Modeling with Pooling Attention
Hang Zhang
Yeyun Gong
Yelong Shen
Weisheng Li
Jiancheng Lv
Nan Duan
Weizhu Chen
35
98
0
10 May 2021
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
285
2,015
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
246
580
0
12 Mar 2020
1