ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.04768
  4. Cited By
Linformer: Self-Attention with Linear Complexity

Linformer: Self-Attention with Linear Complexity

8 June 2020
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
ArXivPDFHTML

Papers citing "Linformer: Self-Attention with Linear Complexity"

50 / 1,050 papers shown
Title
RITA: Group Attention is All You Need for Timeseries Analytics
RITA: Group Attention is All You Need for Timeseries Analytics
Jiaming Liang
Lei Cao
Samuel Madden
Z. Ives
Guoliang Li
AI4TS
21
0
0
02 Jun 2023
The Information Pathways Hypothesis: Transformers are Dynamic
  Self-Ensembles
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
Md Shamim Hussain
Mohammed J Zaki
D. Subramanian
45
3
0
02 Jun 2023
Faster Causal Attention Over Large Sequences Through Sparse Flash
  Attention
Faster Causal Attention Over Large Sequences Through Sparse Flash Attention
Matteo Pagliardini
Daniele Paliotta
Martin Jaggi
Franccois Fleuret
LRM
23
22
0
01 Jun 2023
Coneheads: Hierarchy Aware Attention
Coneheads: Hierarchy Aware Attention
Albert Tseng
Tao Yu
Toni J.B. Liu
Chris De Sa
3DPC
22
5
0
01 Jun 2023
Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal
  Representation
Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation
Yingyi Chen
Qinghua Tao
F. Tonin
Johan A. K. Suykens
42
19
0
31 May 2023
Recasting Self-Attention with Holographic Reduced Representations
Recasting Self-Attention with Holographic Reduced Representations
Mohammad Mahmudul Alam
Edward Raff
Stella Biderman
Tim Oates
James Holt
16
8
0
31 May 2023
Blockwise Parallel Transformer for Large Context Models
Blockwise Parallel Transformer for Large Context Models
Hao Liu
Pieter Abbeel
49
11
0
30 May 2023
Taylorformer: Probabilistic Predictions for Time Series and other
  Processes
Taylorformer: Probabilistic Predictions for Time Series and other Processes
Omer Nivron
R. Parthipan
Damon J. Wischik
BDL
AI4TS
31
2
0
30 May 2023
Networked Time Series Imputation via Position-aware Graph Enhanced
  Variational Autoencoders
Networked Time Series Imputation via Position-aware Graph Enhanced Variational Autoencoders
Dingsu Wang
Yuchen Yan
Ruizhong Qiu
Yada Zhu
Kaiyu Guan
A. Margenot
Yangqiu Song
AI4TS
50
28
0
29 May 2023
Brainformers: Trading Simplicity for Efficiency
Brainformers: Trading Simplicity for Efficiency
Yan-Quan Zhou
Nan Du
Yanping Huang
Daiyi Peng
Chang Lan
...
Zhifeng Chen
Quoc V. Le
Claire Cui
J.H.J. Laundon
J. Dean
MoE
24
22
0
29 May 2023
HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition
HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition
Florian Mai
Juan Pablo Zuluaga
Titouan Parcollet
P. Motlícek
36
10
0
29 May 2023
A Quantitative Review on Language Model Efficiency Research
A Quantitative Review on Language Model Efficiency Research
Meng Jiang
Hy Dang
Lingbo Tong
35
0
0
28 May 2023
Scalable Transformer for PDE Surrogate Modeling
Scalable Transformer for PDE Surrogate Modeling
Zijie Li
Dule Shu
A. Farimani
40
67
0
27 May 2023
COMCAT: Towards Efficient Compression and Customization of
  Attention-Based Vision Models
COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
Jinqi Xiao
Miao Yin
Yu Gong
Xiao Zang
Jian Ren
Bo Yuan
VLM
ViT
50
9
0
26 May 2023
Do We Really Need a Large Number of Visual Prompts?
Do We Really Need a Large Number of Visual Prompts?
Youngeun Kim
Yuhang Li
Abhishek Moitra
Ruokai Yin
Priyadarshini Panda
VLM
VPVLM
50
5
0
26 May 2023
Scissorhands: Exploiting the Persistence of Importance Hypothesis for
  LLM KV Cache Compression at Test Time
Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time
Zichang Liu
Aditya Desai
Fangshuo Liao
Weitao Wang
Victor Xie
Zhaozhuo Xu
Anastasios Kyrillidis
Anshumali Shrivastava
33
204
0
26 May 2023
TranSFormer: Slow-Fast Transformer for Machine Translation
TranSFormer: Slow-Fast Transformer for Machine Translation
Bei Li
Yi Jing
Xu Tan
Zhen Xing
Tong Xiao
Jingbo Zhu
49
7
0
26 May 2023
UMat: Uncertainty-Aware Single Image High Resolution Material Capture
UMat: Uncertainty-Aware Single Image High Resolution Material Capture
Carlos Rodriguez-Pardo
Henar Dominguez-Elvira
David Pascual-Hernández
Elena Garces
35
15
0
25 May 2023
Landmark Attention: Random-Access Infinite Context Length for
  Transformers
Landmark Attention: Random-Access Infinite Context Length for Transformers
Amirkeivan Mohtashami
Martin Jaggi
LLMAG
27
150
0
25 May 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive
  Transformers
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
42
53
0
25 May 2023
Focus Your Attention (with Adaptive IIR Filters)
Focus Your Attention (with Adaptive IIR Filters)
Shahar Lutati
Itamar Zimerman
Lior Wolf
37
10
0
24 May 2023
Frugal Prompting for Dialog Models
Frugal Prompting for Dialog Models
Bishal Santra
Sakya Basak
Abhinandan De
Manish Gupta
Pawan Goyal
30
2
0
24 May 2023
Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model
Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model
Yinghan Long
Sayeed Shafayet Chowdhury
Kaushik Roy
42
1
0
24 May 2023
TACR: A Table-alignment-based Cell-selection and Reasoning Model for
  Hybrid Question-Answering
TACR: A Table-alignment-based Cell-selection and Reasoning Model for Hybrid Question-Answering
Jian Wu
Yicheng Xu
Yan Gao
Jian-Guang Lou
Börje F. Karlsson
Manabu Okumura
LMTD
20
3
0
24 May 2023
A Joint Time-frequency Domain Transformer for Multivariate Time Series
  Forecasting
A Joint Time-frequency Domain Transformer for Multivariate Time Series Forecasting
Yushu Chen
Shengzhuo Liu
Jinzhe Yang
Hao Jing
Wenlai Zhao
Guang-Wu Yang
AI4TS
29
16
0
24 May 2023
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Ziwei He
Meng Yang
Minwei Feng
Jingcheng Yin
Xiang Wang
Jingwen Leng
Zhouhan Lin
ViT
48
13
0
24 May 2023
RWKV: Reinventing RNNs for the Transformer Era
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
97
565
0
22 May 2023
Non-Autoregressive Document-Level Machine Translation
Non-Autoregressive Document-Level Machine Translation
Guangsheng Bao
Zhiyang Teng
Hao Zhou
Jianhao Yan
Yue Zhang
44
0
0
22 May 2023
Farewell to Aimless Large-scale Pretraining: Influential Subset
  Selection for Language Model
Farewell to Aimless Large-scale Pretraining: Influential Subset Selection for Language Model
Xiao Wang
Wei Zhou
Qi Zhang
Jie Zhou
Songyang Gao
Junzhe Wang
Menghan Zhang
Xiang Gao
Yunwen Chen
Tao Gui
53
7
0
22 May 2023
FIT: Far-reaching Interleaved Transformers
FIT: Far-reaching Interleaved Transformers
Ting-Li Chen
Lala Li
37
12
0
22 May 2023
Reducing Sequence Length by Predicting Edit Operations with Large
  Language Models
Reducing Sequence Length by Predicting Edit Operations with Large Language Models
Masahiro Kaneko
Naoaki Okazaki
28
4
0
19 May 2023
Less is More! A slim architecture for optimal language translation
Less is More! A slim architecture for optimal language translation
Luca Herranz-Celotti
E. Rrapaj
39
0
0
18 May 2023
CageViT: Convolutional Activation Guided Efficient Vision Transformer
CageViT: Convolutional Activation Guided Efficient Vision Transformer
Hao Zheng
Jinbao Wang
Xiantong Zhen
Hao Chen
Jingkuan Song
Feng Zheng
ViT
32
0
0
17 May 2023
DLUE: Benchmarking Document Language Understanding
DLUE: Benchmarking Document Language Understanding
Ruoxi Xu
Hongyu Lin
Xinyan Guan
Xianpei Han
Yingfei Sun
Le Sun
ELM
44
0
0
16 May 2023
Hybrid and Collaborative Passage Reranking
Hybrid and Collaborative Passage Reranking
Zongmeng Zhang
Wen-gang Zhou
Jiaxin Shi
Houqiang Li
27
0
0
16 May 2023
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group
  Attention
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
Xinyu Liu
Houwen Peng
Ningxin Zheng
Yuqing Yang
Han Hu
Yixuan Yuan
ViT
30
281
0
11 May 2023
BIOT: Cross-data Biosignal Learning in the Wild
BIOT: Cross-data Biosignal Learning in the Wild
Chaoqi Yang
M. P. M. Brandon Westover
Jimeng Sun
21
9
0
10 May 2023
The emergence of clusters in self-attention dynamics
The emergence of clusters in self-attention dynamics
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
35
47
0
09 May 2023
Online Gesture Recognition using Transformer and Natural Language
  Processing
Online Gesture Recognition using Transformer and Natural Language Processing
Guénolé Silvestre
F. Balado
O. Akinremi
Mirco Ramo
ViT
29
2
0
05 May 2023
The Role of Global and Local Context in Named Entity Recognition
The Role of Global and Local Context in Named Entity Recognition
Arthur Amalvy
Vincent Labatut
Richard Dufour
43
4
0
04 May 2023
A Lightweight CNN-Transformer Model for Learning Traveling Salesman
  Problems
A Lightweight CNN-Transformer Model for Learning Traveling Salesman Problems
Minseop Jung
Jaeseung Lee
Jibum Kim
ViT
29
11
0
03 May 2023
Sequence Modeling with Multiresolution Convolutional Memory
Sequence Modeling with Multiresolution Convolutional Memory
Jiaxin Shi
Ke Alexander Wang
E. Fox
47
13
0
02 May 2023
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Amanda Bertsch
Uri Alon
Graham Neubig
Matthew R. Gormley
RALM
116
123
0
02 May 2023
MH-DETR: Video Moment and Highlight Detection with Cross-modal
  Transformer
MH-DETR: Video Moment and Highlight Detection with Cross-modal Transformer
Yifang Xu
Yunzhuo Sun
Yang Li
Yilei Shi
Xiaoxia Zhu
S. Du
ViT
66
33
0
29 Apr 2023
IMP: Iterative Matching and Pose Estimation with Adaptive Pooling
IMP: Iterative Matching and Pose Estimation with Adaptive Pooling
Fei Xue
Ignas Budvytis
R. Cipolla
44
13
0
28 Apr 2023
Are the Best Multilingual Document Embeddings simply Based on Sentence
  Embeddings?
Are the Best Multilingual Document Embeddings simply Based on Sentence Embeddings?
Sonal Sannigrahi
Josef van Genabith
C. España-Bonet
AILaw
42
4
0
28 Apr 2023
DIAMANT: Dual Image-Attention Map Encoders For Medical Image
  Segmentation
DIAMANT: Dual Image-Attention Map Encoders For Medical Image Segmentation
Yousef Yeganeh
Azade Farshad
Peter Weinberger
Seyed-Ahmad Ahmadi
Ehsan Adeli
Nassir Navab
ViT
MedIm
33
0
0
28 Apr 2023
The Closeness of In-Context Learning and Weight Shifting for Softmax
  Regression
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression
Shuai Li
Zhao Song
Yu Xia
Tong Yu
Dinesh Manocha
41
38
0
26 Apr 2023
SCM: Enhancing Large Language Model with Self-Controlled Memory Framework
SCM: Enhancing Large Language Model with Self-Controlled Memory Framework
Bin Wang
Xinnian Liang
Jian Yang
Huijia Huang
Shuangzhi Wu
Peihao Wu
Lu Lu
Zejun Ma
Zhoujun Li
LLMAG
KELM
RALM
98
26
0
26 Apr 2023
DuETT: Dual Event Time Transformer for Electronic Health Records
DuETT: Dual Event Time Transformer for Electronic Health Records
Alex Labach
Aslesha Pokhrel
Xiao Shi Huang
S. Zuberi
S. Yi
M. Volkovs
T. Poutanen
Rahul G. Krishnan
AI4TS
MedIm
28
3
0
25 Apr 2023
Previous
123...91011...192021
Next