ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.04768
  4. Cited By
Linformer: Self-Attention with Linear Complexity

Linformer: Self-Attention with Linear Complexity

8 June 2020
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
ArXivPDFHTML

Papers citing "Linformer: Self-Attention with Linear Complexity"

50 / 1,050 papers shown
Title
Enabling Lightweight Fine-tuning for Pre-trained Language Model
  Compression based on Matrix Product Operators
Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators
Peiyu Liu
Ze-Feng Gao
Wayne Xin Zhao
Z. Xie
Zhong-Yi Lu
Ji-Rong Wen
15
29
0
04 Jun 2021
Luna: Linear Unified Nested Attention
Luna: Linear Unified Nested Attention
Xuezhe Ma
Xiang Kong
Sinong Wang
Chunting Zhou
Jonathan May
Hao Ma
Luke Zettlemoyer
33
114
0
03 Jun 2021
Container: Context Aggregation Network
Container: Context Aggregation Network
Peng Gao
Jiasen Lu
Hongsheng Li
Roozbeh Mottaghi
Aniruddha Kembhavi
ViT
25
70
0
02 Jun 2021
Database Reasoning Over Text
Database Reasoning Over Text
James Thorne
Majid Yazdani
Marzieh Saeidi
Fabrizio Silvestri
Sebastian Riedel
A. Halevy
ReLM
LMTD
AI4TS
11
37
0
02 Jun 2021
Hi-Transformer: Hierarchical Interactive Transformer for Efficient and
  Effective Long Document Modeling
Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling
Chuhan Wu
Fangzhao Wu
Tao Qi
Yongfeng Huang
31
66
0
02 Jun 2021
THG: Transformer with Hyperbolic Geometry
THG: Transformer with Hyperbolic Geometry
Zhe Liu
Yibin Xu
ViT
20
1
0
01 Jun 2021
DoT: An efficient Double Transformer for NLP tasks with tables
DoT: An efficient Double Transformer for NLP tasks with tables
Syrine Krichene
Thomas Müller
Julian Martin Eisenschlos
12
14
0
01 Jun 2021
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model
Jiangning Zhang
Chao Xu
Jian Li
Wenzhou Chen
Yabiao Wang
Ying Tai
Shuo Chen
Chengjie Wang
Feiyue Huang
Yong Liu
40
22
0
31 May 2021
Choose a Transformer: Fourier or Galerkin
Choose a Transformer: Fourier or Galerkin
Shuhao Cao
42
227
0
31 May 2021
LEAP: Learnable Pruning for Transformer-based Models
LEAP: Learnable Pruning for Transformer-based Models
Z. Yao
Xiaoxia Wu
Linjian Ma
Sheng Shen
Kurt Keutzer
Michael W. Mahoney
Yuxiong He
30
7
0
30 May 2021
Less is More: Pay Less Attention in Vision Transformers
Less is More: Pay Less Attention in Vision Transformers
Zizheng Pan
Bohan Zhuang
Haoyu He
Jing Liu
Jianfei Cai
ViT
24
82
0
29 May 2021
An Attention Free Transformer
An Attention Free Transformer
Shuangfei Zhai
Walter A. Talbott
Nitish Srivastava
Chen Huang
Hanlin Goh
Ruixiang Zhang
J. Susskind
ViT
35
128
0
28 May 2021
Towards mental time travel: a hierarchical memory for reinforcement
  learning agents
Towards mental time travel: a hierarchical memory for reinforcement learning agents
Andrew Kyle Lampinen
Stephanie C. Y. Chan
Andrea Banino
Felix Hill
24
47
0
28 May 2021
Sequence Parallelism: Long Sequence Training from System Perspective
Sequence Parallelism: Long Sequence Training from System Perspective
Shenggui Li
Fuzhao Xue
Chaitanya Baranwal
Yongbin Li
Yang You
22
90
0
26 May 2021
POCFormer: A Lightweight Transformer Architecture for Detection of
  COVID-19 Using Point of Care Ultrasound
POCFormer: A Lightweight Transformer Architecture for Detection of COVID-19 Using Point of Care Ultrasound
Shehan Perera
S. Adhikari
Alper Yilmaz
MedIm
11
28
0
20 May 2021
DCAP: Deep Cross Attentional Product Network for User Response
  Prediction
DCAP: Deep Cross Attentional Product Network for User Response Prediction
Zekai Chen
Fangtian Zhong
Zhumin Chen
Xiao Zhang
Robert Pless
Xiuzhen Cheng
19
11
0
18 May 2021
Relative Positional Encoding for Transformers with Linear Complexity
Relative Positional Encoding for Transformers with Linear Complexity
Antoine Liutkus
Ondřej Cífka
Shih-Lun Wu
Umut Simsekli
Yi-Hsuan Yang
Gaël Richard
38
45
0
18 May 2021
Neural Error Mitigation of Near-Term Quantum Simulations
Neural Error Mitigation of Near-Term Quantum Simulations
Elizabeth R. Bennewitz
Florian Hopfmueller
B. Kulchytskyy
Juan Carrasquilla
Pooya Ronagh
21
54
0
17 May 2021
Doc2Dict: Information Extraction as Text Generation
Doc2Dict: Information Extraction as Text Generation
Benjamin Townsend
Eamon Ito-Fisher
Lily Zhang
Madison May
28
7
0
16 May 2021
Not All Memories are Created Equal: Learning to Forget by Expiring
Not All Memories are Created Equal: Learning to Forget by Expiring
Sainbayar Sukhbaatar
Da Ju
Spencer Poff
Stephen Roller
Arthur Szlam
Jason Weston
Angela Fan
CLL
21
34
0
13 May 2021
EL-Attention: Memory Efficient Lossless Attention for Generation
EL-Attention: Memory Efficient Lossless Attention for Generation
Yu Yan
Jiusheng Chen
Weizhen Qi
Nikhil Bhendawade
Yeyun Gong
Nan Duan
Ruofei Zhang
VLM
34
6
0
11 May 2021
Poolingformer: Long Document Modeling with Pooling Attention
Poolingformer: Long Document Modeling with Pooling Attention
Hang Zhang
Yeyun Gong
Yelong Shen
Weisheng Li
Jiancheng Lv
Nan Duan
Weizhu Chen
43
98
0
10 May 2021
T-EMDE: Sketching-based global similarity for cross-modal retrieval
T-EMDE: Sketching-based global similarity for cross-modal retrieval
Barbara Rychalska
Mikolaj Wieczorek
Jacek Dąbrowski
33
0
0
10 May 2021
Dispatcher: A Message-Passing Approach To Language Modelling
Dispatcher: A Message-Passing Approach To Language Modelling
A. Cetoli
45
0
0
09 May 2021
FNet: Mixing Tokens with Fourier Transforms
FNet: Mixing Tokens with Fourier Transforms
James Lee-Thorp
Joshua Ainslie
Ilya Eckstein
Santiago Ontanon
47
520
0
09 May 2021
Long-Span Summarization via Local Attention and Content Selection
Long-Span Summarization via Local Attention and Content Selection
Potsawee Manakul
Mark Gales
21
42
0
08 May 2021
High-Resolution Optical Flow from 1D Attention and Correlation
High-Resolution Optical Flow from 1D Attention and Correlation
Haofei Xu
Jiaolong Yang
Jianfei Cai
Juyong Zhang
Xin Tong
81
75
0
28 Apr 2021
Shot Contrastive Self-Supervised Learning for Scene Boundary Detection
Shot Contrastive Self-Supervised Learning for Scene Boundary Detection
Shixing Chen
Xiaohan Nie
David D. Fan
Dongqing Zhang
Vimal Bhat
Raffay Hamid
SSL
27
62
0
28 Apr 2021
Transfer training from smaller language model
Transfer training from smaller language model
Han Zhang
43
0
0
23 Apr 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
63
1,226
0
22 Apr 2021
Frustratingly Easy Edit-based Linguistic Steganography with a Masked
  Language Model
Frustratingly Easy Edit-based Linguistic Steganography with a Masked Language Model
Honai Ueoka
Yugo Murawaki
Sadao Kurohashi
13
41
0
20 Apr 2021
Improving Transformer-Kernel Ranking Model Using Conformer and Query
  Term Independence
Improving Transformer-Kernel Ranking Model Using Conformer and Query Term Independence
Bhaskar Mitra
Sebastian Hofstatter
Hamed Zamani
Nick Craswell
19
8
0
19 Apr 2021
A Simple and Effective Positional Encoding for Transformers
A Simple and Effective Positional Encoding for Transformers
Pu-Chin Chen
Henry Tsai
Srinadh Bhojanapalli
Hyung Won Chung
Yin-Wen Chang
Chun-Sung Ferng
61
62
0
18 Apr 2021
Semantic Frame Forecast
Semantic Frame Forecast
Huang Chieh-Yang
Ting-Hao 'Kenneth' Huang
AI4TS
9
5
0
12 Apr 2021
Updater-Extractor Architecture for Inductive World State Representations
Updater-Extractor Architecture for Inductive World State Representations
A. Moskvichev
James Liu
17
4
0
12 Apr 2021
Not All Attention Is All You Need
Not All Attention Is All You Need
Hongqiu Wu
Hai Zhao
Min Zhang
22
9
0
10 Apr 2021
Transformers: "The End of History" for NLP?
Transformers: "The End of History" for NLP?
Anton Chernyavskiy
Dmitry Ilvovsky
Preslav Nakov
47
30
0
09 Apr 2021
Fourier Image Transformer
Fourier Image Transformer
T. Buchholz
Florian Jug
ViT
25
17
0
06 Apr 2021
ViViT: A Video Vision Transformer
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
30
2,093
0
29 Mar 2021
Multi-Scale Vision Longformer: A New Vision Transformer for
  High-Resolution Image Encoding
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
Pengchuan Zhang
Xiyang Dai
Jianwei Yang
Bin Xiao
Lu Yuan
Lei Zhang
Jianfeng Gao
ViT
29
330
0
29 Mar 2021
A Practical Survey on Faster and Lighter Transformers
A Practical Survey on Faster and Lighter Transformers
Quentin Fournier
G. Caron
Daniel Aloise
14
93
0
26 Mar 2021
High-Fidelity Pluralistic Image Completion with Transformers
High-Fidelity Pluralistic Image Completion with Transformers
Bo Liu
Jingbo Zhang
Dongdong Chen
Jing Liao
ViT
28
231
0
25 Mar 2021
Finetuning Pretrained Transformers into RNNs
Finetuning Pretrained Transformers into RNNs
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
44
63
0
24 Mar 2021
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning
  Architectures
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures
Sushant Singh
A. Mahmood
AI4TS
60
94
0
23 Mar 2021
Instance-level Image Retrieval using Reranking Transformers
Instance-level Image Retrieval using Reranking Transformers
Fuwen Tan
Jiangbo Yuan
Vicente Ordonez
ViT
28
89
0
22 Mar 2021
Self-Supervised Test-Time Learning for Reading Comprehension
Self-Supervised Test-Time Learning for Reading Comprehension
Pratyay Banerjee
Tejas Gokhale
Chitta Baral
SSL
17
28
0
20 Mar 2021
Scalable Vision Transformers with Hierarchical Pooling
Scalable Vision Transformers with Hierarchical Pooling
Zizheng Pan
Bohan Zhuang
Jing Liu
Haoyu He
Jianfei Cai
ViT
27
126
0
19 Mar 2021
Value-aware Approximate Attention
Value-aware Approximate Attention
Ankit Gupta
Jonathan Berant
21
5
0
17 Mar 2021
Does the Magic of BERT Apply to Medical Code Assignment? A Quantitative
  Study
Does the Magic of BERT Apply to Medical Code Assignment? A Quantitative Study
Shaoxiong Ji
M. Holtta
Pekka Marttinen
35
72
0
11 Mar 2021
Deep Generative Modelling: A Comparative Review of VAEs, GANs,
  Normalizing Flows, Energy-Based and Autoregressive Models
Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models
Sam Bond-Taylor
Adam Leach
Yang Long
Chris G. Willcocks
VLM
TPM
41
485
0
08 Mar 2021
Previous
123...192021
Next