ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.04768
  4. Cited By
Linformer: Self-Attention with Linear Complexity

Linformer: Self-Attention with Linear Complexity

8 June 2020
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
ArXivPDFHTML

Papers citing "Linformer: Self-Attention with Linear Complexity"

50 / 1,050 papers shown
Title
On the Theoretical Expressive Power and the Design Space of Higher-Order
  Graph Transformers
On the Theoretical Expressive Power and the Design Space of Higher-Order Graph Transformers
Cai Zhou
Rose Yu
Yusu Wang
45
7
0
04 Apr 2024
Optimizing the Deployment of Tiny Transformers on Low-Power MCUs
Optimizing the Deployment of Tiny Transformers on Low-Power MCUs
Victor J. B. Jung
Luca Bompani
Moritz Scherer
Francesco Conti
Luca Benini
32
4
0
03 Apr 2024
Cross-Architecture Transfer Learning for Linear-Cost Inference
  Transformers
Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers
Sehyun Choi
34
3
0
03 Apr 2024
NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation
  Learning for Neural Radiance Fields
NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
Muhammad Zubair Irshad
Sergey Zakahrov
Vitor Campagnolo Guizilini
Adrien Gaidon
Z. Kira
Rares Ambrus
ViT
51
12
0
01 Apr 2024
From Similarity to Superiority: Channel Clustering for Time Series
  Forecasting
From Similarity to Superiority: Channel Clustering for Time Series Forecasting
Jialin Chen
J. E. Lenssen
Aosong Feng
Weihua Hu
Matthias Fey
Leandros Tassiulas
J. Leskovec
Rex Ying
AI4TS
39
10
0
31 Mar 2024
Transformers-based architectures for stroke segmentation: A review
Transformers-based architectures for stroke segmentation: A review
Yalda Zafari-Ghadim
Essam A. Rashed
M. Mabrok
MedIm
30
1
0
27 Mar 2024
Incorporating Exponential Smoothing into MLP: A Simple but Effective
  Sequence Model
Incorporating Exponential Smoothing into MLP: A Simple but Effective Sequence Model
Jiqun Chu
Zuoquan Lin
AI4TS
37
2
0
26 Mar 2024
PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large
  Language Models
PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language Models
Jinyi Li
Yihuai Lan
Lei Wang
Hao Wang
35
0
0
26 Mar 2024
ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV
  Caching
ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
Youpeng Zhao
Di Wu
Jun Wang
35
26
0
26 Mar 2024
Block Selective Reprogramming for On-device Training of Vision
  Transformers
Block Selective Reprogramming for On-device Training of Vision Transformers
Sreetama Sarkar
Souvik Kundu
Kai Zheng
P. Beerel
37
2
0
25 Mar 2024
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for
  Faster Inference
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
Tanvir Mahmud
Burhaneddin Yaman
Chun-Hao Liu
Diana Marculescu
38
2
0
24 Mar 2024
Holographic Global Convolutional Networks for Long-Range Prediction
  Tasks in Malware Detection
Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection
Mohammad Mahmudul Alam
Edward Raff
Stella Biderman
Tim Oates
James Holt
AAML
38
3
0
23 Mar 2024
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal
  Models
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
Yuzhang Shang
Mu Cai
Bingxin Xu
Yong Jae Lee
Yan Yan
VLM
55
107
0
22 Mar 2024
ParFormer: Vision Transformer Baseline with Parallel Local Global Token
  Mixer and Convolution Attention Patch Embedding
ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding
Novendra Setyawan
Ghufron Wahyu Kurniawan
Chi-Chia Sun
Jun-Wei Hsieh
Hui-Kai Su
W. Kuo
ViT
MoE
47
0
0
22 Mar 2024
Hierarchical Skip Decoding for Efficient Autoregressive Text Generation
Hierarchical Skip Decoding for Efficient Autoregressive Text Generation
Yunqi Zhu
Xuebing Yang
Yuanyuan Wu
Wensheng Zhang
33
3
0
22 Mar 2024
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
Joonmyung Choi
Sanghyeok Lee
Jaewon Chu
Minhyuk Choi
Hyunwoo J. Kim
MoMe
ViT
55
12
0
20 Mar 2024
TexTile: A Differentiable Metric for Texture Tileability
TexTile: A Differentiable Metric for Texture Tileability
Carlos Rodriguez-Pardo
Dan Casas
Elena Garces
Jorge López-Moreno
DiffM
41
4
0
19 Mar 2024
MELTing point: Mobile Evaluation of Language Transformers
MELTing point: Mobile Evaluation of Language Transformers
Stefanos Laskaridis
Kleomenis Katevas
Lorenzo Minto
Hamed Haddadi
29
21
0
19 Mar 2024
HCPM: Hierarchical Candidates Pruning for Efficient Detector-Free
  Matching
HCPM: Hierarchical Candidates Pruning for Efficient Detector-Free Matching
Ying Chen
Yong-Jin Liu
Kai Wu
Qiang Nie
Shang Xu
Huifang Ma
Bing Wang
Chengjie Wang
VLM
42
1
0
19 Mar 2024
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
Cunxiang Wang
Ruoxi Ning
Boqi Pan
Tonghui Wu
Qipeng Guo
...
Guangsheng Bao
Xiangkun Hu
Zheng Zhang
Qian Wang
Yue Zhang
RALM
106
4
0
18 Mar 2024
Unifying Feature and Cost Aggregation with Transformers for Semantic and
  Visual Correspondence
Unifying Feature and Cost Aggregation with Transformers for Semantic and Visual Correspondence
Sung‐Jin Hong
Seokju Cho
Seungryong Kim
Stephen Lin
ViT
51
5
0
17 Mar 2024
StainDiffuser: MultiTask Dual Diffusion Model for Virtual Staining
StainDiffuser: MultiTask Dual Diffusion Model for Virtual Staining
Tushar Kataria
Beatrice Knudsen
Shireen Y. Elhabian
DiffM
MedIm
37
9
0
17 Mar 2024
EfficientMorph: Parameter-Efficient Transformer-Based Architecture for
  3D Image Registration
EfficientMorph: Parameter-Efficient Transformer-Based Architecture for 3D Image Registration
Abu Zahid Bin Aziz
Mokshagna Sai Teja Karanam
Tushar Kataria
Shireen Y. Elhabian
ViT
MedIm
36
1
0
16 Mar 2024
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient
  Vision Transformers
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
Sanghyeok Lee
Joonmyung Choi
Hyunwoo J. Kim
ViT
45
8
0
15 Mar 2024
Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
Heejune Sheen
Siyu Chen
Tianhao Wang
Harrison H. Zhou
MLT
46
10
0
13 Mar 2024
StreamingDialogue: Prolonged Dialogue Learning via Long Context
  Compression with Minimal Losses
StreamingDialogue: Prolonged Dialogue Learning via Long Context Compression with Minimal Losses
Jia-Nan Li
Quan Tu
Cunli Mao
Zhengtao Yu
Ji-Rong Wen
Rui Yan
OffRL
29
3
0
13 Mar 2024
TrafficGPT: Breaking the Token Barrier for Efficient Long Traffic
  Analysis and Generation
TrafficGPT: Breaking the Token Barrier for Efficient Long Traffic Analysis and Generation
Jian Qu
Xiaobo Ma
Jianfeng Li
AI4TS
49
10
0
09 Mar 2024
Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like
  Speed
Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed
Yifan Wang
Xingyi He He
Sida Peng
Dongli Tan
Xiaowei Zhou
3DV
38
43
0
07 Mar 2024
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K
  Text-to-Image Generation
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Junsong Chen
Chongjian Ge
Enze Xie
Yue Wu
Lewei Yao
Xiaozhe Ren
Zhongdao Wang
Ping Luo
Huchuan Lu
Zhenguo Li
141
90
0
07 Mar 2024
Mastering Memory Tasks with World Models
Mastering Memory Tasks with World Models
Mohammad Reza Samsami
Artem Zholus
Janarthanan Rajendran
Sarath Chandar
CLL
OffRL
39
23
0
07 Mar 2024
TaylorShift: Shifting the Complexity of Self-Attention from Squared to
  Linear (and Back) using Taylor-Softmax
TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax
Tobias Christian Nauen
Sebastián M. Palacio
Andreas Dengel
54
3
0
05 Mar 2024
NiNformer: A Network in Network Transformer with Token Mixing Generated
  Gating Function
NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function
Abdullah Nazhat Abdullah
Tarkan Aydin
39
0
0
04 Mar 2024
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Yuchen Duan
Weiyun Wang
Zhe Chen
Xizhou Zhu
Lewei Lu
Tong Lu
Yu Qiao
Hongsheng Li
Jifeng Dai
Wenhai Wang
ViT
46
44
0
04 Mar 2024
ATP: Enabling Fast LLM Serving via Attention on Top Principal Keys
ATP: Enabling Fast LLM Serving via Attention on Top Principal Keys
Yue Niu
Saurav Prakash
Salman Avestimehr
37
1
0
01 Mar 2024
RNNs are not Transformers (Yet): The Key Bottleneck on In-context
  Retrieval
RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval
Kaiyue Wen
Xingyu Dang
Kaifeng Lyu
57
25
0
28 Feb 2024
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Mahdi Karami
Ali Ghodsi
VLM
50
6
0
28 Feb 2024
Adaptation of Biomedical and Clinical Pretrained Models to French Long
  Documents: A Comparative Study
Adaptation of Biomedical and Clinical Pretrained Models to French Long Documents: A Comparative Study
Adrien Bazoge
Emmanuel Morin
B. Daille
P. Gourraud
27
2
0
26 Feb 2024
Trajectory Prediction for Autonomous Driving Using a Transformer Network
Trajectory Prediction for Autonomous Driving Using a Transformer Network
Zhenning Li
Hao Yu
28
0
0
26 Feb 2024
Multimodal Transformer With a Low-Computational-Cost Guarantee
Multimodal Transformer With a Low-Computational-Cost Guarantee
Sungjin Park
Edward Choi
52
1
0
23 Feb 2024
Linear Transformers are Versatile In-Context Learners
Linear Transformers are Versatile In-Context Learners
Max Vladymyrov
J. Oswald
Mark Sandler
Rong Ge
47
15
0
21 Feb 2024
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech
  Recognition, Translation, and Language Identification
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Yifan Peng
Yui Sudo
Muhammad Shakeel
Shinji Watanabe
VLM
46
17
0
20 Feb 2024
`Keep it Together': Enforcing Cohesion in Extractive Summaries by
  Simulating Human Memory
`Keep it Together': Enforcing Cohesion in Extractive Summaries by Simulating Human Memory
Ronald Cardenas
Matthias Shen
Shay B. Cohen
29
0
0
16 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
41
48
0
15 Feb 2024
Stochastic Spiking Attention: Accelerating Attention with Stochastic
  Computing in Spiking Networks
Stochastic Spiking Attention: Accelerating Attention with Stochastic Computing in Spiking Networks
Zihang Song
Prabodh Katti
Osvaldo Simeone
Bipin Rajendran
18
3
0
14 Feb 2024
FAST: Factorizable Attention for Speeding up Transformers
FAST: Factorizable Attention for Speeding up Transformers
Armin Gerami
Monte Hoover
P. S. Dulepet
R. Duraiswami
35
0
0
12 Feb 2024
On the Efficacy of Eviction Policy for Key-Value Constrained Generative
  Language Model Inference
On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference
Siyu Ren
Kenny Q. Zhu
26
27
0
09 Feb 2024
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an
  Efficient Context Memory
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
Chaojun Xiao
Pengle Zhang
Xu Han
Guangxuan Xiao
Yankai Lin
Zhengyan Zhang
Zhiyuan Liu
Maosong Sun
LLMAG
47
35
0
07 Feb 2024
CAST: Clustering Self-Attention using Surrogate Tokens for Efficient
  Transformers
CAST: Clustering Self-Attention using Surrogate Tokens for Efficient Transformers
Adjorn van Engelenhoven
Nicola Strisciuglio
Estefanía Talavera
23
1
0
06 Feb 2024
Is Mamba Capable of In-Context Learning?
Is Mamba Capable of In-Context Learning?
Riccardo Grazzi
Julien N. Siems
Simon Schrodi
Thomas Brox
Frank Hutter
32
40
0
05 Feb 2024
Arithmetic Feature Interaction Is Necessary for Deep Tabular Learning
Arithmetic Feature Interaction Is Necessary for Deep Tabular Learning
Yi Cheng
Renjun Hu
Haochao Ying
Xing Shi
Jian Wu
Wei Lin
LMTD
42
8
0
04 Feb 2024
Previous
123...567...192021
Next