Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.02347
Cited By
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention
5 August 2021
T. Nguyen
Vai Suliafu
Stanley J. Osher
Long Chen
Bao Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention"
17 / 17 papers shown
Title
Transformer Meets Twicing: Harnessing Unattended Residual Information
Laziz U. Abdullaev
Tan M. Nguyen
41
2
0
02 Mar 2025
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
Mohammad Mozaffari
Amir Yazdanbakhsh
Zhao Zhang
M. Dehnavi
78
5
0
28 Jan 2025
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences
Zicheng Liu
Siyuan Li
Li Wang
Zedong Wang
Yunfan Liu
Stan Z. Li
35
7
0
12 Jun 2024
Efficient Attention via Control Variates
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
34
18
0
09 Feb 2023
Rock Guitar Tablature Generation via Natural Language Processing
Josue Casco-Rodriguez
34
1
0
12 Jan 2023
Efficient Long Sequence Modeling via State Space Augmented Transformer
Simiao Zuo
Xiaodong Liu
Jian Jiao
Denis Xavier Charles
Eren Manavoglu
Tuo Zhao
Jianfeng Gao
125
36
0
15 Dec 2022
Transformer Meets Boundary Value Inverse Problems
Ruchi Guo
Shuhao Cao
Long Chen
MedIm
36
21
0
29 Sep 2022
Fast-FNet: Accelerating Transformer Encoder Models via Efficient Fourier Layers
Nurullah Sevim
Ege Ozan Özyedek
Furkan Şahinuç
Aykut Koç
35
11
0
26 Sep 2022
Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization
T. Nguyen
Richard G. Baraniuk
Robert M. Kirby
Stanley J. Osher
Bao Wang
32
9
0
01 Aug 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
44
149
0
27 Apr 2022
DCT-Former: Efficient Self-Attention with Discrete Cosine Transform
Carmelo Scribano
Giorgia Franchini
M. Prato
Marko Bertogna
18
21
0
02 Mar 2022
Choose a Transformer: Fourier or Galerkin
Shuhao Cao
42
225
0
31 May 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
272
179
0
17 Feb 2021
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting
Haoyi Zhou
Shanghang Zhang
J. Peng
Shuai Zhang
Jianxin Li
Hui Xiong
Wan Zhang
AI4TS
169
3,885
0
14 Dec 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
285
2,015
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
246
580
0
12 Mar 2020
A Decomposable Attention Model for Natural Language Inference
Ankur P. Parikh
Oscar Täckström
Dipanjan Das
Jakob Uszkoreit
210
1,367
0
06 Jun 2016
1