ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.06732
  4. Cited By
Efficient Transformers: A Survey

Efficient Transformers: A Survey

14 September 2020
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
    VLM
ArXivPDFHTML

Papers citing "Efficient Transformers: A Survey"

50 / 633 papers shown
Title
SOFT: Softmax-free Transformer with Linear Complexity
SOFT: Softmax-free Transformer with Linear Complexity
Jiachen Lu
Jinghan Yao
Junge Zhang
Martin Danelljan
Hang Xu
Weiguo Gao
Chunjing Xu
Thomas B. Schon
Li Zhang
18
161
0
22 Oct 2021
Transformer Acceleration with Dynamic Sparse Attention
Transformer Acceleration with Dynamic Sparse Attention
Liu Liu
Zheng Qu
Zhaodong Chen
Yufei Ding
Yuan Xie
19
20
0
21 Oct 2021
Compositional Attention: Disentangling Search and Retrieval
Compositional Attention: Disentangling Search and Retrieval
Sarthak Mittal
Sharath Chandra Raparthy
Irina Rish
Yoshua Bengio
Guillaume Lajoie
22
20
0
18 Oct 2021
Energon: Towards Efficient Acceleration of Transformers Using Dynamic
  Sparse Attention
Energon: Towards Efficient Acceleration of Transformers Using Dynamic Sparse Attention
Zhe Zhou
Junling Liu
Zhenyu Gu
Guangyu Sun
64
42
0
18 Oct 2021
Sparse Distillation: Speeding Up Text Classification by Using Bigger
  Student Models
Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models
Qinyuan Ye
Madian Khabsa
M. Lewis
Sinong Wang
Xiang Ren
Aaron Jaech
29
5
0
16 Oct 2021
On Learning the Transformer Kernel
On Learning the Transformer Kernel
Sankalan Pal Chowdhury
Adamos Solomou
Kumar Avinava Dubey
Mrinmaya Sachan
ViT
52
14
0
15 Oct 2021
DYLE: Dynamic Latent Extraction for Abstractive Long-Input Summarization
DYLE: Dynamic Latent Extraction for Abstractive Long-Input Summarization
Ziming Mao
Chen Henry Wu
Ansong Ni
Yusen Zhang
Rui Zhang
Tao Yu
Budhaditya Deb
Chenguang Zhu
Ahmed Hassan Awadallah
Dragomir R. Radev
30
56
0
15 Oct 2021
StreaMulT: Streaming Multimodal Transformer for Heterogeneous and
  Arbitrary Long Sequential Data
StreaMulT: Streaming Multimodal Transformer for Heterogeneous and Arbitrary Long Sequential Data
Victor Pellegrain
Myriam Tami
M. Batteux
C´eline Hudelot
AI4TS
20
2
0
15 Oct 2021
How Does Momentum Benefit Deep Neural Networks Architecture Design? A
  Few Case Studies
How Does Momentum Benefit Deep Neural Networks Architecture Design? A Few Case Studies
Bao Wang
Hedi Xia
T. Nguyen
Stanley Osher
AI4CE
37
10
0
13 Oct 2021
Leveraging redundancy in attention with Reuse Transformers
Leveraging redundancy in attention with Reuse Transformers
Srinadh Bhojanapalli
Ayan Chakrabarti
Andreas Veit
Michal Lukasik
Himanshu Jain
Frederick Liu
Yin-Wen Chang
Sanjiv Kumar
18
23
0
13 Oct 2021
Pre-trained Language Models in Biomedical Domain: A Systematic Survey
Pre-trained Language Models in Biomedical Domain: A Systematic Survey
Benyou Wang
Qianqian Xie
Jiahuan Pei
Zhihong Chen
Prayag Tiwari
Zhao Li
Jie Fu
LM&MA
AI4CE
37
163
0
11 Oct 2021
Token Pooling in Vision Transformers
Token Pooling in Vision Transformers
D. Marin
Jen-Hao Rick Chang
Anurag Ranjan
Anish K. Prabhu
Mohammad Rastegari
Oncel Tuzel
ViT
76
66
0
08 Oct 2021
ABC: Attention with Bounded-memory Control
ABC: Attention with Bounded-memory Control
Hao Peng
Jungo Kasai
Nikolaos Pappas
Dani Yogatama
Zhaofeng Wu
Lingpeng Kong
Roy Schwartz
Noah A. Smith
76
22
0
06 Oct 2021
Ripple Attention for Visual Perception with Sub-quadratic Complexity
Ripple Attention for Visual Perception with Sub-quadratic Complexity
Lin Zheng
Huijie Pan
Lingpeng Kong
26
3
0
06 Oct 2021
PoNet: Pooling Network for Efficient Token Mixing in Long Sequences
PoNet: Pooling Network for Efficient Token Mixing in Long Sequences
Chao-Hong Tan
Qian Chen
Wen Wang
Qinglin Zhang
Siqi Zheng
Zhenhua Ling
ViT
22
11
0
06 Oct 2021
Understanding and Overcoming the Challenges of Efficient Transformer
  Quantization
Understanding and Overcoming the Challenges of Efficient Transformer Quantization
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
12
133
0
27 Sep 2021
Vision Transformer Hashing for Image Retrieval
Vision Transformer Hashing for Image Retrieval
S. Dubey
S. Singh
Wei Chu
ViT
35
49
0
26 Sep 2021
Long-Range Transformers for Dynamic Spatiotemporal Forecasting
Long-Range Transformers for Dynamic Spatiotemporal Forecasting
J. E. Grigsby
Zhe Wang
Nam Nguyen
Yanjun Qi
AI4TS
69
87
0
24 Sep 2021
Named Entity Recognition and Classification on Historical Documents: A
  Survey
Named Entity Recognition and Classification on Historical Documents: A Survey
Maud Ehrmann
Ahmed Hamdi
Elvys Linhares Pontes
Matteo Romanello
A. Doucet
60
108
0
23 Sep 2021
Scale Efficiently: Insights from Pre-training and Fine-tuning
  Transformers
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Yi Tay
Mostafa Dehghani
J. Rao
W. Fedus
Samira Abnar
Hyung Won Chung
Sharan Narang
Dani Yogatama
Ashish Vaswani
Donald Metzler
203
110
0
22 Sep 2021
Audiomer: A Convolutional Transformer For Keyword Spotting
Surya Kant Sahu
Sai Mitheran
Juhi Kamdar
Meet Gandhi
40
8
0
21 Sep 2021
Survey: Transformer based Video-Language Pre-training
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLM
ViT
72
44
0
21 Sep 2021
General Cross-Architecture Distillation of Pretrained Language Models
  into Matrix Embeddings
General Cross-Architecture Distillation of Pretrained Language Models into Matrix Embeddings
Lukas Galke
Isabelle Cuber
Christophe Meyer
Henrik Ferdinand Nolscher
Angelina Sonderecker
A. Scherp
28
2
0
17 Sep 2021
SHAPE: Shifted Absolute Position Embedding for Transformers
SHAPE: Shifted Absolute Position Embedding for Transformers
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
Kentaro Inui
233
45
0
13 Sep 2021
Query-driven Segment Selection for Ranking Long Documents
Query-driven Segment Selection for Ranking Long Documents
Youngwoo Kim
Razieh Rahimi
Hamed Bonab
James Allan
RALM
25
5
0
10 Sep 2021
MATE: Multi-view Attention for Table Transformer Efficiency
MATE: Multi-view Attention for Table Transformer Efficiency
Julian Martin Eisenschlos
Maharshi Gor
Thomas Müller
William W. Cohen
LMTD
69
95
0
09 Sep 2021
Sparsity and Sentence Structure in Encoder-Decoder Attention of
  Summarization Systems
Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization Systems
Potsawee Manakul
Mark J. F. Gales
18
5
0
08 Sep 2021
Memory and Knowledge Augmented Language Models for Inferring Salience in
  Long-Form Stories
Memory and Knowledge Augmented Language Models for Inferring Salience in Long-Form Stories
David Wilmot
Frank Keller
RALM
KELM
16
21
0
08 Sep 2021
PermuteFormer: Efficient Relative Position Encoding for Long Sequences
PermuteFormer: Efficient Relative Position Encoding for Long Sequences
Peng-Jen Chen
28
21
0
06 Sep 2021
$\infty$-former: Infinite Memory Transformer
∞\infty∞-former: Infinite Memory Transformer
Pedro Henrique Martins
Zita Marinho
André F. T. Martins
28
11
0
01 Sep 2021
SHIFT15M: Fashion-specific dataset for set-to-set matching with several
  distribution shifts
SHIFT15M: Fashion-specific dataset for set-to-set matching with several distribution shifts
Masanari Kimura
Takuma Nakamura
Yuki Saito
OOD
39
3
0
30 Aug 2021
A Web Scale Entity Extraction System
A Web Scale Entity Extraction System
Xuanting Cai
Quanbin Ma
Pan Li
Jianyu Liu
Qi Zeng
Zhengkan Yang
Pushkar Tripathi
25
0
0
27 Aug 2021
Greenformers: Improving Computation and Memory Efficiency in Transformer
  Models via Low-Rank Approximation
Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation
Samuel Cahyawijaya
26
12
0
24 Aug 2021
Smart Bird: Learnable Sparse Attention for Efficient and Effective
  Transformer
Smart Bird: Learnable Sparse Attention for Efficient and Effective Transformer
Chuhan Wu
Fangzhao Wu
Tao Qi
Binxing Jiao
Daxin Jiang
Yongfeng Huang
Xing Xie
30
3
0
20 Aug 2021
Fastformer: Additive Attention Can Be All You Need
Fastformer: Additive Attention Can Be All You Need
Chuhan Wu
Fangzhao Wu
Tao Qi
Yongfeng Huang
Xing Xie
46
117
0
20 Aug 2021
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field
  and Far-field Attention
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention
T. Nguyen
Vai Suliafu
Stanley J. Osher
Long Chen
Bao Wang
21
35
0
05 Aug 2021
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
Yifan Xu
Zhijie Zhang
Mengdan Zhang
Kekai Sheng
Ke Li
Weiming Dong
Liqing Zhang
Changsheng Xu
Xing Sun
ViT
32
201
0
03 Aug 2021
Representation learning for neural population activity with Neural Data
  Transformers
Representation learning for neural population activity with Neural Data Transformers
Joel Ye
C. Pandarinath
AI4TS
AI4CE
11
51
0
02 Aug 2021
A Survey of Human-in-the-loop for Machine Learning
A Survey of Human-in-the-loop for Machine Learning
Xingjiao Wu
Luwei Xiao
Yixuan Sun
Junhang Zhang
Tianlong Ma
Liangbo He
SyDa
40
505
0
02 Aug 2021
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Andrew Jaegle
Sebastian Borgeaud
Jean-Baptiste Alayrac
Carl Doersch
Catalin Ionescu
...
Olivier J. Hénaff
M. Botvinick
Andrew Zisserman
Oriol Vinyals
João Carreira
MLLM
VLM
GNN
20
565
0
30 Jul 2021
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for
  Sequences
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences
Zhenhai Zhu
Radu Soricut
112
41
0
25 Jul 2021
Clinical Relation Extraction Using Transformer-based Models
Clinical Relation Extraction Using Transformer-based Models
Xi Yang
Zehao Yu
Yi Guo
Jiang Bian
Yonghui Wu
LM&MA
MedIm
24
20
0
19 Jul 2021
Video Crowd Localization with Multi-focus Gaussian Neighborhood
  Attention and a Large-Scale Benchmark
Video Crowd Localization with Multi-focus Gaussian Neighborhood Attention and a Large-Scale Benchmark
Haopeng Li
Lingbo Liu
Kunlin Yang
Shinan Liu
Junyuan Gao
Bin Zhao
Rui Zhang
Jun Hou
44
14
0
19 Jul 2021
STAR: Sparse Transformer-based Action Recognition
STAR: Sparse Transformer-based Action Recognition
Feng Shi
Chonghan Lee
Liang Qiu
Yizhou Zhao
Tianyi Shen
Shivran Muralidhar
Tian Han
Song-Chun Zhu
V. Narayanan
ViT
21
26
0
15 Jul 2021
Efficient Transformer for Direct Speech Translation
Efficient Transformer for Direct Speech Translation
Belen Alastruey
Gerard I. Gállego
Marta R. Costa-jussá
19
7
0
07 Jul 2021
Poly-NL: Linear Complexity Non-local Layers with Polynomials
Poly-NL: Linear Complexity Non-local Layers with Polynomials
F. Babiloni
Ioannis Marras
Filippos Kokkinos
Jiankang Deng
Grigorios G. Chrysos
S. Zafeiriou
31
6
0
06 Jul 2021
Clustering and attention model based for intelligent trading
Clustering and attention model based for intelligent trading
Mimansa Rana
Nanxiang Mao
Ming Ao
Xiaohui Wu
Poning Liang
Matloob Khushi
21
1
0
06 Jul 2021
Vision Xformers: Efficient Attention for Image Classification
Vision Xformers: Efficient Attention for Image Classification
Pranav Jeevan
Amit Sethi
ViT
17
13
0
05 Jul 2021
A Primer on Pretrained Multilingual Language Models
A Primer on Pretrained Multilingual Language Models
Sumanth Doddapaneni
Gowtham Ramesh
Mitesh M. Khapra
Anoop Kunchukuttan
Pratyush Kumar
LRM
43
73
0
01 Jul 2021
Improving the Efficiency of Transformers for Resource-Constrained
  Devices
Improving the Efficiency of Transformers for Resource-Constrained Devices
Hamid Tabani
Ajay Balasubramaniam
Shabbir Marzban
Elahe Arani
Bahram Zonooz
35
20
0
30 Jun 2021
Previous
123...10111213
Next