ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.04768
  4. Cited By
Linformer: Self-Attention with Linear Complexity

Linformer: Self-Attention with Linear Complexity

8 June 2020
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
ArXivPDFHTML

Papers citing "Linformer: Self-Attention with Linear Complexity"

50 / 1,050 papers shown
Title
Studying inductive biases in image classification task
Studying inductive biases in image classification task
N. Arizumi
31
1
0
31 Oct 2022
XNOR-FORMER: Learning Accurate Approximations in Long Speech
  Transformers
XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers
Roshan S. Sharma
Bhiksha Raj
28
3
0
29 Oct 2022
Transformers meet Stochastic Block Models: Attention with Data-Adaptive
  Sparsity and Cost
Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost
Sungjun Cho
Seonwoo Min
Jinwoo Kim
Moontae Lee
Honglak Lee
Seunghoon Hong
42
3
0
27 Oct 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for
  Language Models
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
45
50
0
25 Oct 2022
A Survey on Artificial Intelligence for Music Generation: Agents,
  Domains and Perspectives
A Survey on Artificial Intelligence for Music Generation: Agents, Domains and Perspectives
Carlos Hernandez-Olivan
Javier Hernandez-Olivan
J. R. Beltrán
MGen
47
6
0
25 Oct 2022
How Long Is Enough? Exploring the Optimal Intervals of Long-Range
  Clinical Note Language Modeling
How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling
Samuel Cahyawijaya
Bryan Wilie
Holy Lovenia
Huang Zhong
Mingqian Zhong
Yuk-Yu Nancy Ip
Pascale Fung
LM&MA
28
2
0
25 Oct 2022
Diffuser: Efficient Transformers with Multi-hop Attention Diffusion for
  Long Sequences
Diffuser: Efficient Transformers with Multi-hop Attention Diffusion for Long Sequences
Aosong Feng
Irene Z Li
Yuang Jiang
Rex Ying
37
18
0
21 Oct 2022
Mitigating spectral bias for the multiscale operator learning
Mitigating spectral bias for the multiscale operator learning
Xinliang Liu
Bo Xu
Shuhao Cao
Lei Zhang
AI4CE
38
26
0
19 Oct 2022
Museformer: Transformer with Fine- and Coarse-Grained Attention for
  Music Generation
Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation
Botao Yu
Peiling Lu
Rui Wang
Wei Hu
Xu Tan
Wei Ye
Shikun Zhang
Tao Qin
Tie-Yan Liu
MGen
35
55
0
19 Oct 2022
The Devil in Linear Transformer
The Devil in Linear Transformer
Zhen Qin
Xiaodong Han
Weixuan Sun
Dongxu Li
Lingpeng Kong
Nick Barnes
Yiran Zhong
36
71
0
19 Oct 2022
Dense but Efficient VideoQA for Intricate Compositional Reasoning
Dense but Efficient VideoQA for Intricate Compositional Reasoning
Jihyeon Janel Lee
Wooyoung Kang
Eun-Sol Kim
CoGe
24
3
0
19 Oct 2022
Token Merging: Your ViT But Faster
Token Merging: Your ViT But Faster
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
MoMe
51
427
0
17 Oct 2022
What Makes Convolutional Models Great on Long Sequence Modeling?
What Makes Convolutional Models Great on Long Sequence Modeling?
Yuhong Li
Tianle Cai
Yi Zhang
De-huai Chen
Debadeepta Dey
VLM
39
96
0
17 Oct 2022
Modeling Context With Linear Attention for Scalable Document-Level
  Translation
Modeling Context With Linear Attention for Scalable Document-Level Translation
Zhaofeng Wu
Hao Peng
Nikolaos Pappas
Noah A. Smith
22
3
0
16 Oct 2022
Linear Video Transformer with Feature Fixation
Linear Video Transformer with Feature Fixation
Kaiyue Lu
Zexia Liu
Jianyuan Wang
Weixuan Sun
Zhen Qin
...
Xuyang Shen
Huizhong Deng
Xiaodong Han
Yuchao Dai
Yiran Zhong
35
4
0
15 Oct 2022
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
Jinchao Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
3DV
46
9
0
14 Oct 2022
Exploring Long-Sequence Masked Autoencoders
Exploring Long-Sequence Masked Autoencoders
Ronghang Hu
Shoubhik Debnath
Saining Xie
Xinlei Chen
8
18
0
13 Oct 2022
RTFormer: Efficient Design for Real-Time Semantic Segmentation with
  Transformer
RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer
Jian Wang
Chen-xi Gou
Qiman Wu
Haocheng Feng
Junyu Han
Errui Ding
Jingdong Wang
ViT
36
96
0
13 Oct 2022
LSG Attention: Extrapolation of pretrained Transformers to long
  sequences
LSG Attention: Extrapolation of pretrained Transformers to long sequences
Charles Condevaux
S. Harispe
40
24
0
13 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Brian Bartoldson
B. Kailkhura
Davis W. Blalock
36
47
0
13 Oct 2022
OpenCQA: Open-ended Question Answering with Charts
OpenCQA: Open-ended Question Answering with Charts
Shankar Kantharaj
Do Xuan Long
Rixie Tiffany Ko Leong
J. Tan
Enamul Hoque
Shafiq Joty
34
47
0
12 Oct 2022
Designing Robust Transformers using Robust Kernel Density Estimation
Designing Robust Transformers using Robust Kernel Density Estimation
Xing Han
Tongzheng Ren
T. Nguyen
Khai Nguyen
Joydeep Ghosh
Nhat Ho
32
6
0
11 Oct 2022
Memory transformers for full context and high-resolution 3D Medical
  Segmentation
Memory transformers for full context and high-resolution 3D Medical Segmentation
Loic Themyr
Clément Rambour
Nicolas Thome
Toby Collins
Alexandre Hostettler
ViT
MedIm
34
4
0
11 Oct 2022
Turbo Training with Token Dropout
Turbo Training with Token Dropout
Tengda Han
Weidi Xie
Andrew Zisserman
ViT
39
10
0
10 Oct 2022
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
H. H. Mao
71
21
0
09 Oct 2022
Bird-Eye Transformers for Text Generation Models
Bird-Eye Transformers for Text Generation Models
Lei Sha
Yuhang Song
Yordan Yordanov
Tommaso Salvatori
Thomas Lukasiewicz
30
0
0
08 Oct 2022
Hierarchical Graph Transformer with Adaptive Node Sampling
Hierarchical Graph Transformer with Adaptive Node Sampling
Zaixin Zhang
Qi Liu
Qingyong Hu
Cheekong Lee
81
82
0
08 Oct 2022
Compressed Vision for Efficient Video Understanding
Compressed Vision for Efficient Video Understanding
Olivia Wiles
João Carreira
Iain Barr
Andrew Zisserman
Mateusz Malinowski
27
7
0
06 Oct 2022
Temporally Consistent Transformers for Video Generation
Temporally Consistent Transformers for Video Generation
Wilson Yan
Danijar Hafner
Stephen James
Pieter Abbeel
DiffM
27
28
0
05 Oct 2022
WavSpA: Wavelet Space Attention for Boosting Transformers' Long Sequence
  Learning Ability
WavSpA: Wavelet Space Attention for Boosting Transformers' Long Sequence Learning Ability
Yufan Zhuang
Zihan Wang
Fangbo Tao
Jingbo Shang
ViT
AI4TS
37
3
0
05 Oct 2022
DARTFormer: Finding The Best Type Of Attention
DARTFormer: Finding The Best Type Of Attention
Jason Brown
Yiren Zhao
Ilia Shumailov
Robert D. Mullins
32
6
0
02 Oct 2022
Wide Attention Is The Way Forward For Transformers?
Wide Attention Is The Way Forward For Transformers?
Jason Brown
Yiren Zhao
Ilia Shumailov
Robert D. Mullins
21
7
0
02 Oct 2022
Grouped self-attention mechanism for a memory-efficient Transformer
Grouped self-attention mechanism for a memory-efficient Transformer
Bumjun Jung
Yusuke Mukuta
Tatsuya Harada
AI4TS
14
3
0
02 Oct 2022
E-Branchformer: Branchformer with Enhanced merging for speech
  recognition
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Kwangyoun Kim
Felix Wu
Yifan Peng
Jing Pan
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
61
105
0
30 Sep 2022
Transformers for Object Detection in Large Point Clouds
Transformers for Object Detection in Large Point Clouds
Felicia Ruppel
F. Faion
Claudius Gläser
Klaus C. J. Dietmayer
ViT
34
5
0
30 Sep 2022
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers
  for Streaming Speech Recognition
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition
Martin H. Radfar
Rohit Barnwal
Rupak Vignesh Swaminathan
Feng-Ju Chang
Grant P. Strimel
Nathan Susanj
Athanasios Mouchtaris
39
13
0
29 Sep 2022
Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual
  Grounding
Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding
Fengyuan Shi
Ruopeng Gao
Weilin Huang
Limin Wang
30
23
0
28 Sep 2022
Liquid Structural State-Space Models
Liquid Structural State-Space Models
Ramin Hasani
Mathias Lechner
Tsun-Hsuan Wang
Makram Chahine
Alexander Amini
Daniela Rus
AI4TS
107
98
0
26 Sep 2022
Mega: Moving Average Equipped Gated Attention
Mega: Moving Average Equipped Gated Attention
Xuezhe Ma
Chunting Zhou
Xiang Kong
Junxian He
Liangke Gui
Graham Neubig
Jonathan May
Luke Zettlemoyer
38
183
0
21 Sep 2022
Adapting Pretrained Text-to-Text Models for Long Text Sequences
Adapting Pretrained Text-to-Text Models for Long Text Sequences
Wenhan Xiong
Anchit Gupta
Shubham Toshniwal
Yashar Mehdad
Wen-tau Yih
RALM
VLM
62
30
0
21 Sep 2022
Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and
  Algorithm Co-design
Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design
Hongxiang Fan
Thomas C. P. Chau
Stylianos I. Venieris
Royson Lee
Alexandros Kouris
Wayne Luk
Nicholas D. Lane
Mohamed S. Abdelfattah
40
58
0
20 Sep 2022
Graph Reasoning Transformer for Image Parsing
Graph Reasoning Transformer for Image Parsing
Dong Zhang
Jinhui Tang
Kwang-Ting Cheng
ViT
26
16
0
20 Sep 2022
Real-time Online Video Detection with Temporal Smoothing Transformers
Real-time Online Video Detection with Temporal Smoothing Transformers
Yue Zhao
Philipp Krahenbuhl
ViT
69
57
0
19 Sep 2022
Integrative Feature and Cost Aggregation with Transformers for Dense
  Correspondence
Integrative Feature and Cost Aggregation with Transformers for Dense Correspondence
Sunghwan Hong
Seokju Cho
Seung Wook Kim
Stephen Lin
3DV
47
4
0
19 Sep 2022
Hydra Attention: Efficient Attention with Many Heads
Hydra Attention: Efficient Attention with Many Heads
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Judy Hoffman
99
78
0
15 Sep 2022
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Shigang Li
Kazuki Osawa
Torsten Hoefler
82
31
0
14 Sep 2022
On The Computational Complexity of Self-Attention
On The Computational Complexity of Self-Attention
Feyza Duman Keles
Pruthuvi Maheshakya Wijewardena
Chinmay Hegde
73
111
0
11 Sep 2022
Pre-Training a Graph Recurrent Network for Language Representation
Pre-Training a Graph Recurrent Network for Language Representation
Yile Wang
Linyi Yang
Zhiyang Teng
M. Zhou
Yue Zhang
GNN
38
1
0
08 Sep 2022
A Circular Window-based Cascade Transformer for Online Action Detection
A Circular Window-based Cascade Transformer for Online Action Detection
Shuyuan Cao
Weihua Luo
Bairui Wang
Wei Emma Zhang
Lin Ma
54
6
0
30 Aug 2022
Learning Heterogeneous Interaction Strengths by Trajectory Prediction
  with Graph Neural Network
Learning Heterogeneous Interaction Strengths by Trajectory Prediction with Graph Neural Network
Seungwoong Ha
Hawoong Jeong
38
5
0
28 Aug 2022
Previous
123...121314...192021
Next