Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.11960
Cited By
Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences
18 October 2023
Yanming Kang
Giang Tran
H. Sterck
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences"
34 / 34 papers shown
Title
Multi Resolution Analysis (MRA) for Approximate Self-Attention
Zhanpeng Zeng
Sourav Pal
Jeffery Kline
G. Fung
Vikas Singh
25
8
0
21 Jul 2022
cosFormer: Rethinking Softmax in Attention
Zhen Qin
Weixuan Sun
Huicai Deng
Dongxu Li
Yunshen Wei
Baohong Lv
Junjie Yan
Lingpeng Kong
Yiran Zhong
67
220
0
17 Feb 2022
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention
T. Nguyen
Vai Suliafu
Stanley J. Osher
Long Chen
Bao Wang
52
36
0
05 Aug 2021
PDE-GCN: Novel Architectures for Graph Neural Networks Motivated by Partial Differential Equations
Moshe Eliasof
E. Haber
Eran Treister
GNN
AI4CE
92
127
0
04 Aug 2021
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences
Zhenhai Zhu
Radu Soricut
135
42
0
25 Jul 2021
GRAND: Graph Neural Diffusion
B. Chamberlain
J. Rowbottom
Maria I. Gorinova
Stefan Webb
Emanuele Rossi
M. Bronstein
GNN
123
267
0
21 Jun 2021
Random Feature Attention
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
107
358
0
03 Mar 2021
Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
Yunyang Xiong
Zhanpeng Zeng
Rudrasis Chakraborty
Mingxing Tan
G. Fung
Yin Li
Vikas Singh
90
521
0
07 Feb 2021
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Songlin Yang
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
147
720
0
08 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
654
41,103
0
22 Oct 2020
Fourier Neural Operator for Parametric Partial Differential Equations
Zong-Yi Li
Nikola B. Kovachki
Kamyar Azizzadenesheli
Burigede Liu
K. Bhattacharya
Andrew M. Stuart
Anima Anandkumar
AI4CE
497
2,414
0
18 Oct 2020
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
181
1,585
0
30 Sep 2020
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
154
1,123
0
14 Sep 2020
Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding
Shuohang Wang
Luowei Zhou
Zhe Gan
Yen-Chun Chen
Yuwei Fang
S. Sun
Yu Cheng
Jingjing Liu
72
29
0
13 Sep 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
543
2,086
0
28 Jul 2020
Fast Transformers with Clustered Attention
Apoorv Vyas
Angelos Katharopoulos
Franccois Fleuret
57
153
0
09 Jul 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
201
1,771
0
29 Jun 2020
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
216
1,706
0
08 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
795
42,055
0
28 May 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
223
3,139
0
16 May 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
171
4,071
0
10 Apr 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
312
597
0
12 Mar 2020
Reformer: The Efficient Transformer
Nikita Kitaev
Lukasz Kaiser
Anselm Levskaya
VLM
189
2,320
0
13 Jan 2020
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
125
1,899
0
23 Apr 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
VLM
FaML
109
3,151
0
01 Apr 2019
Layer-Parallel Training of Deep Residual Neural Networks
Stefanie Günther
Lars Ruthotto
J. Schroder
E. Cyr
N. Gauger
57
90
0
11 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
94,891
0
11 Oct 2018
Neural Ordinary Differential Equations
T. Chen
Yulia Rubanova
J. Bettencourt
David Duvenaud
AI4CE
417
5,111
0
19 Jun 2018
Generating Wikipedia by Summarizing Long Sequences
Peter J. Liu
Mohammad Saleh
Etienne Pot
Ben Goodrich
Ryan Sepassi
Lukasz Kaiser
Noam M. Shazeer
CVBM
188
799
0
30 Jan 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
701
131,652
0
12 Jun 2017
Stable Architectures for Deep Neural Networks
E. Haber
Lars Ruthotto
145
729
0
09 May 2017
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
328
2,876
0
26 Sep 2016
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Frank Hutter
ODL
333
8,130
0
13 Aug 2016
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
558
27,311
0
01 Sep 2014
1