ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.04768
  4. Cited By
Linformer: Self-Attention with Linear Complexity

Linformer: Self-Attention with Linear Complexity

8 June 2020
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
ArXivPDFHTML

Papers citing "Linformer: Self-Attention with Linear Complexity"

50 / 1,050 papers shown
Title
DialogLM: Pre-trained Model for Long Dialogue Understanding and
  Summarization
DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization
Ming Zhong
Yang Liu
Yichong Xu
Chenguang Zhu
Michael Zeng
VLM
AI4CE
46
125
0
06 Sep 2021
Multitask Balanced and Recalibrated Network for Medical Code Prediction
Multitask Balanced and Recalibrated Network for Medical Code Prediction
Wei Sun
Shaoxiong Ji
Min Zhang
Pekka Marttinen
22
15
0
06 Sep 2021
PermuteFormer: Efficient Relative Position Encoding for Long Sequences
PermuteFormer: Efficient Relative Position Encoding for Long Sequences
Peng-Jen Chen
36
21
0
06 Sep 2021
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action
  Recognition
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition
Jiawei Chen
C. Ho
ViT
26
77
0
20 Aug 2021
Smart Bird: Learnable Sparse Attention for Efficient and Effective
  Transformer
Smart Bird: Learnable Sparse Attention for Efficient and Effective Transformer
Chuhan Wu
Fangzhao Wu
Tao Qi
Binxing Jiao
Daxin Jiang
Yongfeng Huang
Xing Xie
38
3
0
20 Aug 2021
Fastformer: Additive Attention Can Be All You Need
Fastformer: Additive Attention Can Be All You Need
Chuhan Wu
Fangzhao Wu
Tao Qi
Yongfeng Huang
Xing Xie
46
117
0
20 Aug 2021
Learning to Match Features with Seeded Graph Matching Network
Learning to Match Features with Seeded Graph Matching Network
Hongkai Chen
Zixin Luo
Jiahui Zhang
Lei Zhou
Xuyang Bai
Zeyu Hu
Chiew-Lan Tai
Long Quan
17
111
0
19 Aug 2021
Video Transformer for Deepfake Detection with Incremental Learning
Video Transformer for Deepfake Detection with Incremental Learning
Sohail Ahmed Khan
Hang Dai
ViT
24
63
0
11 Aug 2021
Adaptive Multi-Resolution Attention with Linear Complexity
Adaptive Multi-Resolution Attention with Linear Complexity
Yao Zhang
Yunpu Ma
T. Seidl
Volker Tresp
20
1
0
10 Aug 2021
Decoupled Transformer for Scalable Inference in Open-domain Question
  Answering
Decoupled Transformer for Scalable Inference in Open-domain Question Answering
Haytham ElFadeel
Stanislav Peshterliev
34
1
0
05 Aug 2021
Fast Convergence of DETR with Spatially Modulated Co-Attention
Fast Convergence of DETR with Spatially Modulated Co-Attention
Peng Gao
Minghang Zheng
Xiaogang Wang
Jifeng Dai
Hongsheng Li
ViT
30
305
0
05 Aug 2021
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field
  and Far-field Attention
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention
T. Nguyen
Vai Suliafu
Stanley J. Osher
Long Chen
Bao Wang
29
35
0
05 Aug 2021
Armour: Generalizable Compact Self-Attention for Vision Transformers
Armour: Generalizable Compact Self-Attention for Vision Transformers
Lingchuan Meng
ViT
21
3
0
03 Aug 2021
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
Yifan Xu
Zhijie Zhang
Mengdan Zhang
Kekai Sheng
Ke Li
Weiming Dong
Liqing Zhang
Changsheng Xu
Xing Sun
ViT
32
205
0
03 Aug 2021
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Andrew Jaegle
Sebastian Borgeaud
Jean-Baptiste Alayrac
Carl Doersch
Catalin Ionescu
...
Olivier J. Hénaff
M. Botvinick
Andrew Zisserman
Oriol Vinyals
João Carreira
MLLM
VLM
GNN
22
567
0
30 Jul 2021
Exceeding the Limits of Visual-Linguistic Multi-Task Learning
Exceeding the Limits of Visual-Linguistic Multi-Task Learning
Cameron R. Wolfe
Keld T. Lundgaard
VLM
45
2
0
27 Jul 2021
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for
  Sequences
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences
Zhenhai Zhu
Radu Soricut
112
41
0
25 Jul 2021
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao
Suvinay Subramanian
Gaurav Agrawal
Amir Yazdanbakhsh
T. Krishna
46
58
0
13 Jul 2021
Combiner: Full Attention Transformer with Sparse Computation Cost
Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
87
77
0
12 Jul 2021
Long Short-Term Transformer for Online Action Detection
Long Short-Term Transformer for Online Action Detection
Mingze Xu
Yuanjun Xiong
Hao Chen
Xinyu Li
Wei Xia
Zhuowen Tu
Stefano Soatto
ViT
40
130
0
07 Jul 2021
Efficient Transformer for Direct Speech Translation
Efficient Transformer for Direct Speech Translation
Belen Alastruey
Gerard I. Gállego
Marta R. Costa-jussá
27
7
0
07 Jul 2021
Poly-NL: Linear Complexity Non-local Layers with Polynomials
Poly-NL: Linear Complexity Non-local Layers with Polynomials
F. Babiloni
Ioannis Marras
Filippos Kokkinos
Jiankang Deng
Grigorios G. Chrysos
S. Zafeiriou
39
6
0
06 Jul 2021
What Helps Transformers Recognize Conversational Structure? Importance
  of Context, Punctuation, and Labels in Dialog Act Recognition
What Helps Transformers Recognize Conversational Structure? Importance of Context, Punctuation, and Labels in Dialog Act Recognition
Piotr Żelasko
R. Pappagari
Najim Dehak
20
13
0
05 Jul 2021
Vision Xformers: Efficient Attention for Image Classification
Vision Xformers: Efficient Attention for Image Classification
Pranav Jeevan
Amit Sethi
ViT
25
13
0
05 Jul 2021
Long-Short Transformer: Efficient Transformers for Language and Vision
Long-Short Transformer: Efficient Transformers for Language and Vision
Chen Zhu
Ming-Yu Liu
Chaowei Xiao
M. Shoeybi
Tom Goldstein
Anima Anandkumar
Bryan Catanzaro
ViT
VLM
32
130
0
05 Jul 2021
Learned Token Pruning for Transformers
Learned Token Pruning for Transformers
Sehoon Kim
Sheng Shen
D. Thorsley
A. Gholami
Woosuk Kwon
Joseph Hassoun
Kurt Keutzer
17
145
0
02 Jul 2021
Transformer-F: A Transformer network with effective methods for learning
  universal sentence representation
Transformer-F: A Transformer network with effective methods for learning universal sentence representation
Yu Shi
18
1
0
02 Jul 2021
UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation
UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation
Yunhe Gao
Mu Zhou
Dimitris N. Metaxas
MedIm
ViT
13
425
0
02 Jul 2021
Knowledge Transfer by Discriminative Pre-training for Academic
  Performance Prediction
Knowledge Transfer by Discriminative Pre-training for Academic Performance Prediction
Byungsoo Kim
Hangyeol Yu
Dongmin Shin
Youngduck Choi
14
1
0
28 Jun 2021
Charformer: Fast Character Transformers via Gradient-based Subword
  Tokenization
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Yi Tay
Vinh Q. Tran
Sebastian Ruder
Jai Gupta
Hyung Won Chung
Dara Bahri
Zhen Qin
Simon Baumgartner
Cong Yu
Donald Metzler
51
153
0
23 Jun 2021
IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision
  Transformers
IA-RED2^22: Interpretability-Aware Redundancy Reduction for Vision Transformers
Bowen Pan
Yikang Shen
Yi Ding
Zhangyang Wang
Rogerio Feris
A. Oliva
VLM
ViT
39
156
0
23 Jun 2021
Stable, Fast and Accurate: Kernelized Attention with Relative Positional
  Encoding
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding
Shengjie Luo
Shanda Li
Tianle Cai
Di He
Dinglan Peng
Shuxin Zheng
Guolin Ke
Liwei Wang
Tie-Yan Liu
35
50
0
23 Jun 2021
Continuous-Time Deep Glioma Growth Models
Continuous-Time Deep Glioma Growth Models
Jens Petersen
Fabian Isensee
Gregor Koehler
Paul F. Jäger
David Zimmerer
...
J. Debus
S. Heiland
Martin Bendszus
Philipp Vollmuth
Klaus H. Maier-Hein
3DH
21
12
0
23 Jun 2021
Revisiting Deep Learning Models for Tabular Data
Revisiting Deep Learning Models for Tabular Data
Yu. V. Gorishniy
Ivan Rubachev
Valentin Khrulkov
Artem Babenko
LMTD
48
703
0
22 Jun 2021
XCiT: Cross-Covariance Image Transformers
XCiT: Cross-Covariance Image Transformers
Alaaeldin El-Nouby
Hugo Touvron
Mathilde Caron
Piotr Bojanowski
Matthijs Douze
...
Ivan Laptev
Natalia Neverova
Gabriel Synnaeve
Jakob Verbeek
Hervé Jégou
ViT
42
499
0
17 Jun 2021
Large-Scale Chemical Language Representations Capture Molecular
  Structure and Properties
Large-Scale Chemical Language Representations Capture Molecular Structure and Properties
Jerret Ross
Brian M. Belgodere
Vijil Chenthamarakshan
Inkit Padhi
Youssef Mroueh
Payel Das
AI4CE
32
274
0
17 Jun 2021
What Context Features Can Transformer Language Models Use?
What Context Features Can Transformer Language Models Use?
J. O'Connor
Jacob Andreas
KELM
29
75
0
15 Jun 2021
PairConnect: A Compute-Efficient MLP Alternative to Attention
PairConnect: A Compute-Efficient MLP Alternative to Attention
Zhaozhuo Xu
Minghao Yan
Junyan Zhang
Anshumali Shrivastava
46
1
0
15 Jun 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
58
816
0
14 Jun 2021
Styleformer: Transformer based Generative Adversarial Networks with
  Style Vector
Styleformer: Transformer based Generative Adversarial Networks with Style Vector
Jeeseung Park
Younggeun Kim
ViT
29
48
0
13 Jun 2021
Memory-efficient Transformers via Top-$k$ Attention
Memory-efficient Transformers via Top-kkk Attention
Ankit Gupta
Guy Dar
Shaya Goodman
David Ciprut
Jonathan Berant
MQ
48
51
0
13 Jun 2021
HR-NAS: Searching Efficient High-Resolution Neural Architectures with
  Lightweight Transformers
HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers
Mingyu Ding
Xiaochen Lian
Linjie Yang
Peng Wang
Xiaojie Jin
Zhiwu Lu
Ping Luo
ViT
42
59
0
11 Jun 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
30
274
0
09 Jun 2021
Do Transformers Really Perform Bad for Graph Representation?
Do Transformers Really Perform Bad for Graph Representation?
Chengxuan Ying
Tianle Cai
Shengjie Luo
Shuxin Zheng
Guolin Ke
Di He
Yanming Shen
Tie-Yan Liu
GNN
48
435
0
09 Jun 2021
Salient Positions based Attention Network for Image Classification
Salient Positions based Attention Network for Image Classification
Sheng Fang
Kaiyu Li
Zhe Li
35
3
0
09 Jun 2021
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
Rabeeh Karimi Mahabadi
James Henderson
Sebastian Ruder
MoE
67
469
0
08 Jun 2021
A Survey of Transformers
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
53
1,088
0
08 Jun 2021
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Tianlong Chen
Yu Cheng
Zhe Gan
Lu Yuan
Lei Zhang
Zhangyang Wang
ViT
24
216
0
08 Jun 2021
Self-supervised Depth Estimation Leveraging Global Perception and
  Geometric Smoothness Using On-board Videos
Self-supervised Depth Estimation Leveraging Global Perception and Geometric Smoothness Using On-board Videos
Shaocheng Jia
Xin Pei
W. Yao
S. Wong
3DPC
MDE
48
19
0
07 Jun 2021
Oriented Object Detection with Transformer
Oriented Object Detection with Transformer
Teli Ma
Mingyuan Mao
Honghui Zheng
Peng Gao
Xiaodi Wang
Shumin Han
Errui Ding
Baochang Zhang
David Doermann
ViT
27
40
0
06 Jun 2021
Previous
123...18192021
Next