ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.04768
  4. Cited By
Linformer: Self-Attention with Linear Complexity

Linformer: Self-Attention with Linear Complexity

8 June 2020
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
ArXivPDFHTML

Papers citing "Linformer: Self-Attention with Linear Complexity"

50 / 1,050 papers shown
Title
Enable Deep Learning on Mobile Devices: Methods, Systems, and
  Applications
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Han Cai
Ji Lin
Chengyue Wu
Zhijian Liu
Haotian Tang
Hanrui Wang
Ligeng Zhu
Song Han
29
108
0
25 Apr 2022
ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for
  Efficient Feature Matching
ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching
Yanxing Shi
Junxiong Cai
Yoli Shavit
Tai-Jiang Mu
Wensen Feng
Kai Zhang
GNN
27
77
0
25 Apr 2022
Transformation Invariant Cancerous Tissue Classification Using Spatially
  Transformed DenseNet
Transformation Invariant Cancerous Tissue Classification Using Spatially Transformed DenseNet
Omar Mahdi
Ali Bou Nassif
MedIm
9
2
0
23 Apr 2022
Investigating Neural Architectures by Synthetic Dataset Design
Investigating Neural Architectures by Synthetic Dataset Design
Adrien Courtois
Jean-Michel Morel
Pablo Arias
30
4
0
23 Apr 2022
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better
  than Dot-Product Self-Attention
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention
Tong Yu
Ruslan Khalitov
Lei Cheng
Zhirong Yang
MoE
27
10
0
22 Apr 2022
NFormer: Robust Person Re-identification with Neighbor Transformer
NFormer: Robust Person Re-identification with Neighbor Transformer
Haochen Wang
Jiayi Shen
Yongtuo Liu
Yan Gao
E. Gavves
ViT
34
120
0
20 Apr 2022
On the Locality of Attention in Direct Speech Translation
On the Locality of Attention in Direct Speech Translation
Belen Alastruey
Javier Ferrando
Gerard I. Gállego
Marta R. Costa-jussá
16
7
0
19 Apr 2022
Towards Lightweight Transformer via Group-wise Transformation for
  Vision-and-Language Tasks
Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Yan Wang
Liujuan Cao
Yongjian Wu
Feiyue Huang
Rongrong Ji
ViT
22
43
0
16 Apr 2022
Efficient Linear Attention for Fast and Accurate Keypoint Matching
Efficient Linear Attention for Fast and Accurate Keypoint Matching
Suwichaya Suwanwimolkul
S. Komorita
3DPC
3DV
22
11
0
16 Apr 2022
LaMemo: Language Modeling with Look-Ahead Memory
LaMemo: Language Modeling with Look-Ahead Memory
Haozhe Ji
Rongsheng Zhang
Zhenyu Yang
Zhipeng Hu
Minlie Huang
KELM
RALM
CLL
15
3
0
15 Apr 2022
SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide
  Association Study
SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study
Samuel Cahyawijaya
Tiezheng Yu
Zihan Liu
Tiffany Mak
Xiaopu Zhou
N. Ip
Pascale Fung
21
8
0
14 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
28
6
0
11 Apr 2022
Linear Complexity Randomized Self-attention Mechanism
Linear Complexity Randomized Self-attention Mechanism
Lin Zheng
Chong-Jun Wang
Lingpeng Kong
22
31
0
10 Apr 2022
Few-Shot Forecasting of Time-Series with Heterogeneous Channels
Few-Shot Forecasting of Time-Series with Heterogeneous Channels
L. Brinkmeyer
Rafael Rêgo Drumond
Johannes Burchert
Lars Schmidt-Thieme
AI4TS
28
7
0
07 Apr 2022
Accelerating Attention through Gradient-Based Learned Runtime Pruning
Accelerating Attention through Gradient-Based Learned Runtime Pruning
Zheng Li
Soroush Ghodrati
Amir Yazdanbakhsh
H. Esmaeilzadeh
Mingu Kang
27
17
0
07 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
54
39
0
06 Apr 2022
TALLFormer: Temporal Action Localization with a Long-memory Transformer
TALLFormer: Temporal Action Localization with a Long-memory Transformer
Feng Cheng
Gedas Bertasius
ViT
37
91
0
04 Apr 2022
TubeDETR: Spatio-Temporal Video Grounding with Transformers
TubeDETR: Spatio-Temporal Video Grounding with Transformers
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
35
94
0
30 Mar 2022
A Fast Post-Training Pruning Framework for Transformers
A Fast Post-Training Pruning Framework for Transformers
Woosuk Kwon
Sehoon Kim
Michael W. Mahoney
Joseph Hassoun
Kurt Keutzer
A. Gholami
29
144
0
29 Mar 2022
Fine-tuning Image Transformers using Learnable Memory
Fine-tuning Image Transformers using Learnable Memory
Mark Sandler
A. Zhmoginov
Max Vladymyrov
Andrew Jackson
ViT
34
47
0
29 Mar 2022
Discovering material information using hierarchical Reformer model on
  financial regulatory filings
Discovering material information using hierarchical Reformer model on financial regulatory filings
Francois Mercier
Makesh Narsimhan
AIFin
AI4TS
13
0
0
28 Mar 2022
FS6D: Few-Shot 6D Pose Estimation of Novel Objects
FS6D: Few-Shot 6D Pose Estimation of Novel Objects
Yisheng He
Yao Wang
Haoqiang Fan
Jian Sun
Qifeng Chen
35
83
0
28 Mar 2022
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token
  Selection
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
Xin Huang
A. Khetan
Rene Bidart
Zohar Karnin
21
14
0
27 Mar 2022
Diagonal State Spaces are as Effective as Structured State Spaces
Diagonal State Spaces are as Effective as Structured State Spaces
Ankit Gupta
Albert Gu
Jonathan Berant
59
293
0
27 Mar 2022
Error Correction Code Transformer
Error Correction Code Transformer
Yoni Choukroun
Lior Wolf
27
47
0
27 Mar 2022
A Survey on Aspect-Based Sentiment Classification
A Survey on Aspect-Based Sentiment Classification
Gianni Brauwers
Flavius Frasincar
LLMAG
41
110
0
27 Mar 2022
A General Survey on Attention Mechanisms in Deep Learning
A General Survey on Attention Mechanisms in Deep Learning
Gianni Brauwers
Flavius Frasincar
33
298
0
27 Mar 2022
Give Me Your Attention: Dot-Product Attention Considered Harmful for
  Adversarial Patch Robustness
Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness
Giulio Lovisotto
Nicole Finnie
Mauricio Muñoz
Chaithanya Kumar Mummadi
J. H. Metzen
AAML
ViT
30
32
0
25 Mar 2022
Vision Transformer Compression with Structured Pruning and Low Rank
  Approximation
Vision Transformer Compression with Structured Pruning and Low Rank Approximation
Ankur Kumar
ViT
36
6
0
25 Mar 2022
Mokey: Enabling Narrow Fixed-Point Inference for Out-of-the-Box
  Floating-Point Transformer Models
Mokey: Enabling Narrow Fixed-Point Inference for Out-of-the-Box Floating-Point Transformer Models
Ali Hadi Zadeh
Mostafa Mahmoud
Ameer Abdelhadi
Andreas Moshovos
MQ
24
31
0
23 Mar 2022
Linearizing Transformer with Key-Value Memory
Linearizing Transformer with Key-Value Memory
Yizhe Zhang
Deng Cai
30
5
0
23 Mar 2022
ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through
  Regularized Self-Attention
ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through Regularized Self-Attention
Yang Liu
Jiaxiang Liu
L. Chen
Yuxiang Lu
Shi Feng
Zhida Feng
Yu Sun
Hao Tian
Huancheng Wu
Hai-feng Wang
31
9
0
23 Mar 2022
Open-Vocabulary DETR with Conditional Matching
Open-Vocabulary DETR with Conditional Matching
Yuhang Zang
Wei Li
Kaiyang Zhou
Chen Huang
Chen Change Loy
ObjD
VLM
41
197
0
22 Mar 2022
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
Kuan-Chih Huang
Tsung-Han Wu
Hung-Ting Su
Winston H. Hsu
ViT
MDE
15
159
0
21 Mar 2022
FAR: Fourier Aerial Video Recognition
FAR: Fourier Aerial Video Recognition
D. Kothandaraman
Tianrui Guan
Xijun Wang
Sean Hu
Ming-Shun Lin
Tianyi Zhou
26
13
0
21 Mar 2022
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from
  Point Clouds
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds
Chenhang He
Ruihuang Li
Shuai Li
Lei Zhang
ViT
3DPC
27
163
0
19 Mar 2022
Local-Global Context Aware Transformer for Language-Guided Video
  Segmentation
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
Chen Liang
Wenguan Wang
Tianfei Zhou
Jiaxu Miao
Yawei Luo
Yi Yang
VOS
34
74
0
18 Mar 2022
Memorizing Transformers
Memorizing Transformers
Yuhuai Wu
M. Rabe
DeLesley S. Hutchins
Christian Szegedy
RALM
30
173
0
16 Mar 2022
Enriched CNN-Transformer Feature Aggregation Networks for
  Super-Resolution
Enriched CNN-Transformer Feature Aggregation Networks for Super-Resolution
Jinsu Yoo
Taehoon Kim
Sihaeng Lee
Seunghyeon Kim
Hankook Lee
Tae Hyun Kim
SupR
ViT
34
51
0
15 Mar 2022
Long Document Summarization with Top-down and Bottom-up Inference
Long Document Summarization with Top-down and Bottom-up Inference
Bo Pang
Erik Nijkamp
Wojciech Kry'sciñski
Silvio Savarese
Yingbo Zhou
Caiming Xiong
RALM
BDL
24
55
0
15 Mar 2022
Block-Recurrent Transformers
Block-Recurrent Transformers
DeLesley S. Hutchins
Imanol Schlag
Yuhuai Wu
Ethan Dyer
Behnam Neyshabur
28
94
0
11 Mar 2022
Towards Self-Supervised Category-Level Object Pose and Size Estimation
Towards Self-Supervised Category-Level Object Pose and Size Estimation
Yisheng He
Haoqiang Fan
Haibin Huang
Qifeng Chen
Jian Sun
24
17
0
06 Mar 2022
DCT-Former: Efficient Self-Attention with Discrete Cosine Transform
DCT-Former: Efficient Self-Attention with Discrete Cosine Transform
Carmelo Scribano
Giorgia Franchini
M. Prato
Marko Bertogna
18
21
0
02 Mar 2022
FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours
FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours
Shenggan Cheng
Xuanlei Zhao
Guangyang Lu
Bin-Rui Li
Zhongming Yu
Tian Zheng
R. Wu
Xiwen Zhang
Jian Peng
Yang You
AI4CE
27
30
0
02 Mar 2022
Enhancing Local Feature Learning for 3D Point Cloud Processing using
  Unary-Pairwise Attention
Enhancing Local Feature Learning for 3D Point Cloud Processing using Unary-Pairwise Attention
H. Xiu
Xin Liu
Weimin Wang
Kyoung-Sook Kim
T. Shinohara
Qiong Chang
M. Matsuoka
3DPC
38
5
0
01 Mar 2022
A Data-scalable Transformer for Medical Image Segmentation:
  Architecture, Model Efficiency, and Benchmark
A Data-scalable Transformer for Medical Image Segmentation: Architecture, Model Efficiency, and Benchmark
Yunhe Gao
Mu Zhou
Ding Liu
Zhennan Yan
Shaoting Zhang
Dimitris N. Metaxas
ViT
MedIm
28
68
0
28 Feb 2022
Dynamic N:M Fine-grained Structured Sparse Attention Mechanism
Dynamic N:M Fine-grained Structured Sparse Attention Mechanism
Zhaodong Chen
Yuying Quan
Zheng Qu
L. Liu
Yufei Ding
Yuan Xie
36
22
0
28 Feb 2022
State-of-the-Art in the Architecture, Methods and Applications of
  StyleGAN
State-of-the-Art in the Architecture, Methods and Applications of StyleGAN
Amit H. Bermano
Rinon Gal
Yuval Alaluf
Ron Mokady
Yotam Nitzan
Omer Tov
Or Patashnik
Daniel Cohen-Or
32
81
0
28 Feb 2022
Optimal-er Auctions through Attention
Optimal-er Auctions through Attention
Dmitry Ivanov
Iskander Safiulin
Igor Filippov
Ksenia Balabaeva
26
31
0
26 Feb 2022
NoisyTune: A Little Noise Can Help You Finetune Pretrained Language
  Models Better
NoisyTune: A Little Noise Can Help You Finetune Pretrained Language Models Better
Chuhan Wu
Fangzhao Wu
Tao Qi
Yongfeng Huang
Xing Xie
27
58
0
24 Feb 2022
Previous
123...151617...192021
Next