ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.04768
  4. Cited By
Linformer: Self-Attention with Linear Complexity

Linformer: Self-Attention with Linear Complexity

8 June 2020
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
ArXivPDFHTML

Papers citing "Linformer: Self-Attention with Linear Complexity"

50 / 1,050 papers shown
Title
STJLA: A Multi-Context Aware Spatio-Temporal Joint Linear Attention
  Network for Traffic Forecasting
STJLA: A Multi-Context Aware Spatio-Temporal Joint Linear Attention Network for Traffic Forecasting
Yuchen Fang
Yanjun Qin
Haiyong Luo
Fang Zhao
Chenxing Wang
GNN
AI4TS
19
1
0
04 Dec 2021
Linear algebra with transformers
Linear algebra with transformers
Franccois Charton
AIMat
29
56
0
03 Dec 2021
OCR-free Document Understanding Transformer
OCR-free Document Understanding Transformer
Geewook Kim
Teakgyu Hong
Moonbin Yim
Jeongyeon Nam
Jinyoung Park
Jinyeong Yim
Wonseok Hwang
Sangdoo Yun
Dongyoon Han
Seunghyun Park
ViT
68
264
0
30 Nov 2021
Contrastive Learning for Local and Global Learning MRI Reconstruction
Contrastive Learning for Local and Global Learning MRI Reconstruction
Qiaosi Yi
Jinhao Liu
Le Hu
Faming Fang
Guixu Zhang
25
5
0
30 Nov 2021
Adaptive Fourier Neural Operators: Efficient Token Mixers for
  Transformers
Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers
John Guibas
Morteza Mardani
Zong-Yi Li
Andrew Tao
Anima Anandkumar
Bryan Catanzaro
21
231
0
24 Nov 2021
Efficient Video Transformers with Spatial-Temporal Token Selection
Efficient Video Transformers with Spatial-Temporal Token Selection
Junke Wang
Xitong Yang
Hengduo Li
Li Liu
Zuxuan Wu
Yu-Gang Jiang
ViT
21
63
0
23 Nov 2021
DyFormer: A Scalable Dynamic Graph Transformer with Provable Benefits on
  Generalization Ability
DyFormer: A Scalable Dynamic Graph Transformer with Provable Benefits on Generalization Ability
Weilin Cong
Yanhong Wu
Yuandong Tian
Mengting Gu
Yinglong Xia
C. Chen
Mehrdad Mahdavi
AI4CE
25
8
0
19 Nov 2021
You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli
  Sampling
You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling
Zhanpeng Zeng
Yunyang Xiong
Sathya Ravi
Shailesh Acharya
G. Fung
Vikas Singh
35
19
0
18 Nov 2021
Comparative Study of Long Document Classification
Comparative Study of Long Document Classification
Vedangi Wagh
Snehal Khandve
Isha Joshi
Apurva Wani
Geetanjali Kale
Raviraj Joshi
24
25
0
01 Nov 2021
PatchFormer: An Efficient Point Transformer with Patch Attention
PatchFormer: An Efficient Point Transformer with Patch Attention
Zhang Cheng
Haocheng Wan
Xinyi Shen
Zizhao Wu
3DPC
24
66
0
30 Oct 2021
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström
  Method
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström Method
Yifan Chen
Qi Zeng
Heng Ji
Yun Yang
24
49
0
29 Oct 2021
Scatterbrain: Unifying Sparse and Low-rank Attention Approximation
Scatterbrain: Unifying Sparse and Low-rank Attention Approximation
Beidi Chen
Tri Dao
Eric Winsor
Zhao Song
Atri Rudra
Christopher Ré
37
125
0
28 Oct 2021
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel
  Training
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training
Yongbin Li
Hongxin Liu
Zhengda Bian
Boxiang Wang
Haichen Huang
Fan Cui
Chuan-Qing Wang
Yang You
GNN
30
143
0
28 Oct 2021
VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using
  Vector Quantization
VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization
Mucong Ding
Kezhi Kong
Jingling Li
Chen Zhu
John P. Dickerson
Furong Huang
Tom Goldstein
GNN
MQ
33
47
0
27 Oct 2021
SOFT: Softmax-free Transformer with Linear Complexity
SOFT: Softmax-free Transformer with Linear Complexity
Jiachen Lu
Jinghan Yao
Junge Zhang
Martin Danelljan
Hang Xu
Weiguo Gao
Chunjing Xu
Thomas B. Schon
Li Zhang
18
161
0
22 Oct 2021
Transformer Acceleration with Dynamic Sparse Attention
Transformer Acceleration with Dynamic Sparse Attention
Liu Liu
Zheng Qu
Zhaodong Chen
Yufei Ding
Yuan Xie
19
20
0
21 Oct 2021
Compositional Attention: Disentangling Search and Retrieval
Compositional Attention: Disentangling Search and Retrieval
Sarthak Mittal
Sharath Chandra Raparthy
Irina Rish
Yoshua Bengio
Guillaume Lajoie
22
20
0
18 Oct 2021
Improving Transformers with Probabilistic Attention Keys
Improving Transformers with Probabilistic Attention Keys
Tam Nguyen
T. Nguyen
Dung D. Le
Duy Khuong Nguyen
Viet-Anh Tran
Richard G. Baraniuk
Nhat Ho
Stanley J. Osher
53
32
0
16 Oct 2021
Sparse Distillation: Speeding Up Text Classification by Using Bigger
  Student Models
Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models
Qinyuan Ye
Madian Khabsa
M. Lewis
Sinong Wang
Xiang Ren
Aaron Jaech
39
5
0
16 Oct 2021
On Learning the Transformer Kernel
On Learning the Transformer Kernel
Sankalan Pal Chowdhury
Adamos Solomou
Kumar Avinava Dubey
Mrinmaya Sachan
ViT
56
14
0
15 Oct 2021
Meta-learning via Language Model In-context Tuning
Meta-learning via Language Model In-context Tuning
Yanda Chen
Ruiqi Zhong
Sheng Zha
George Karypis
He He
236
158
0
15 Oct 2021
How Does Momentum Benefit Deep Neural Networks Architecture Design? A
  Few Case Studies
How Does Momentum Benefit Deep Neural Networks Architecture Design? A Few Case Studies
Bao Wang
Hedi Xia
T. Nguyen
Stanley Osher
AI4CE
50
10
0
13 Oct 2021
Yformer: U-Net Inspired Transformer Architecture for Far Horizon Time
  Series Forecasting
Yformer: U-Net Inspired Transformer Architecture for Far Horizon Time Series Forecasting
Kiran Madhusudhanan
Johannes Burchert
Nghia Duong-Trung
Stefan Born
Lars Schmidt-Thieme
AI4TS
AI4CE
41
21
0
13 Oct 2021
Speech Summarization using Restricted Self-Attention
Speech Summarization using Restricted Self-Attention
Roshan S. Sharma
Shruti Palaskar
A. Black
Florian Metze
30
33
0
12 Oct 2021
StARformer: Transformer with State-Action-Reward Representations for
  Visual Reinforcement Learning
StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning
Jinghuan Shang
Kumara Kahatapitiya
Xiang Li
Michael S. Ryoo
OffRL
45
33
0
12 Oct 2021
LightSeq2: Accelerated Training for Transformer-based Models on GPUs
LightSeq2: Accelerated Training for Transformer-based Models on GPUs
Xiaohui Wang
Yang Wei
Ying Xiong
Guyue Huang
Xian Qian
Yufei Ding
Mingxuan Wang
Lei Li
VLM
16
30
0
12 Oct 2021
DCT: Dynamic Compressive Transformer for Modeling Unbounded Sequence
DCT: Dynamic Compressive Transformer for Modeling Unbounded Sequence
Kai-Po Chang
Wei-Yun Ma
14
0
0
10 Oct 2021
Token Pooling in Vision Transformers
Token Pooling in Vision Transformers
D. Marin
Jen-Hao Rick Chang
Anurag Ranjan
Anish K. Prabhu
Mohammad Rastegari
Oncel Tuzel
ViT
76
66
0
08 Oct 2021
Efficient and Private Federated Learning with Partially Trainable
  Networks
Efficient and Private Federated Learning with Partially Trainable Networks
Hakim Sidahmed
Zheng Xu
Ankush Garg
Yuan Cao
Mingqing Chen
FedML
59
13
0
06 Oct 2021
ABC: Attention with Bounded-memory Control
ABC: Attention with Bounded-memory Control
Hao Peng
Jungo Kasai
Nikolaos Pappas
Dani Yogatama
Zhaofeng Wu
Lingpeng Kong
Roy Schwartz
Noah A. Smith
76
22
0
06 Oct 2021
Ripple Attention for Visual Perception with Sub-quadratic Complexity
Ripple Attention for Visual Perception with Sub-quadratic Complexity
Lin Zheng
Huijie Pan
Lingpeng Kong
33
3
0
06 Oct 2021
PoNet: Pooling Network for Efficient Token Mixing in Long Sequences
PoNet: Pooling Network for Efficient Token Mixing in Long Sequences
Chao-Hong Tan
Qian Chen
Wen Wang
Qinglin Zhang
Siqi Zheng
Zhenhua Ling
ViT
22
11
0
06 Oct 2021
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision
  Transformer
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
218
1,219
0
05 Oct 2021
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
MoE
29
118
0
05 Oct 2021
Classification of hierarchical text using geometric deep learning: the
  case of clinical trials corpus
Classification of hierarchical text using geometric deep learning: the case of clinical trials corpus
Sohrab Ferdowsi
Nikolay Borissov
J. Knafou
P. Amini
Douglas Teodoro
16
7
0
04 Oct 2021
Redesigning the Transformer Architecture with Insights from
  Multi-particle Dynamical Systems
Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems
Subhabrata Dutta
Tanya Gautam
Soumen Chakrabarti
Tanmoy Chakraborty
56
16
0
30 Sep 2021
UFO-ViT: High Performance Linear Vision Transformer without Softmax
UFO-ViT: High Performance Linear Vision Transformer without Softmax
Jeonggeun Song
ViT
114
20
0
29 Sep 2021
Understanding and Overcoming the Challenges of Efficient Transformer
  Quantization
Understanding and Overcoming the Challenges of Efficient Transformer Quantization
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
25
133
0
27 Sep 2021
Long-Range Transformers for Dynamic Spatiotemporal Forecasting
Long-Range Transformers for Dynamic Spatiotemporal Forecasting
J. E. Grigsby
Zhe Wang
Nam Nguyen
Yanjun Qi
AI4TS
69
88
0
24 Sep 2021
Predicting Attention Sparsity in Transformers
Predicting Attention Sparsity in Transformers
Marcos Vinícius Treviso
António Góis
Patrick Fernandes
E. Fonseca
André F. T. Martins
39
13
0
24 Sep 2021
Audiomer: A Convolutional Transformer For Keyword Spotting
Surya Kant Sahu
Sai Mitheran
Juhi Kamdar
Meet Gandhi
40
8
0
21 Sep 2021
Do Long-Range Language Models Actually Use Long-Range Context?
Do Long-Range Language Models Actually Use Long-Range Context?
Simeng Sun
Kalpesh Krishna
Andrew Mattarella-Micke
Mohit Iyyer
RALM
25
82
0
19 Sep 2021
Sparse Factorization of Large Square Matrices
Sparse Factorization of Large Square Matrices
Ruslan Khalitov
Tong Yu
Lei Cheng
Zhirong Yang
11
2
0
16 Sep 2021
Anchor DETR: Query Design for Transformer-Based Object Detection
Anchor DETR: Query Design for Transformer-Based Object Detection
Yingming Wang
Xinming Zhang
Tong Yang
Jian Sun
ViT
16
53
0
15 Sep 2021
PnP-DETR: Towards Efficient Visual Analysis with Transformers
PnP-DETR: Towards Efficient Visual Analysis with Transformers
Tao Wang
Li Yuan
Yunpeng Chen
Jiashi Feng
Shuicheng Yan
ViT
24
84
0
15 Sep 2021
Query-driven Segment Selection for Ranking Long Documents
Query-driven Segment Selection for Ranking Long Documents
Youngwoo Kim
Razieh Rahimi
Hamed Bonab
James Allan
RALM
30
5
0
10 Sep 2021
Speechformer: Reducing Information Loss in Direct Speech Translation
Speechformer: Reducing Information Loss in Direct Speech Translation
Sara Papi
Marco Gaido
Matteo Negri
Marco Turchi
65
23
0
09 Sep 2021
Is Attention Better Than Matrix Decomposition?
Is Attention Better Than Matrix Decomposition?
Zhengyang Geng
Meng-Hao Guo
Hongxu Chen
Xia Li
Ke Wei
Zhouchen Lin
62
138
0
09 Sep 2021
MATE: Multi-view Attention for Table Transformer Efficiency
MATE: Multi-view Attention for Table Transformer Efficiency
Julian Martin Eisenschlos
Maharshi Gor
Thomas Müller
William W. Cohen
LMTD
75
95
0
09 Sep 2021
The Sensory Neuron as a Transformer: Permutation-Invariant Neural
  Networks for Reinforcement Learning
The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning
Yujin Tang
David R Ha
24
75
0
07 Sep 2021
Previous
123...1718192021
Next