ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.16236
  4. Cited By
Transformers are RNNs: Fast Autoregressive Transformers with Linear
  Attention

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

29 June 2020
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
ArXivPDFHTML

Papers citing "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention"

50 / 346 papers shown
Title
Bird-Eye Transformers for Text Generation Models
Bird-Eye Transformers for Text Generation Models
Lei Sha
Yuhang Song
Yordan Yordanov
Tommaso Salvatori
Thomas Lukasiewicz
30
0
0
08 Oct 2022
Images as Weight Matrices: Sequential Image Generation Through Synaptic
  Learning Rules
Images as Weight Matrices: Sequential Image Generation Through Synaptic Learning Rules
Kazuki Irie
Jürgen Schmidhuber
37
5
0
07 Oct 2022
WavSpA: Wavelet Space Attention for Boosting Transformers' Long Sequence
  Learning Ability
WavSpA: Wavelet Space Attention for Boosting Transformers' Long Sequence Learning Ability
Yufan Zhuang
Zihan Wang
Fangbo Tao
Jingbo Shang
ViT
AI4TS
35
3
0
05 Oct 2022
Transformer Meets Boundary Value Inverse Problems
Transformer Meets Boundary Value Inverse Problems
Ruchi Guo
Shuhao Cao
Long Chen
MedIm
36
21
0
29 Sep 2022
Lightweight Monocular Depth Estimation with an Edge Guided Network
Lightweight Monocular Depth Estimation with an Edge Guided Network
Xingshuai Dong
Matthew A. Garratt
S. Anavatti
H. Abbass
Junyu Dong
MDE
25
2
0
29 Sep 2022
Effective General-Domain Data Inclusion for the Machine Translation Task
  by Vanilla Transformers
Effective General-Domain Data Inclusion for the Machine Translation Task by Vanilla Transformers
H. Soliman
32
0
0
28 Sep 2022
Liquid Structural State-Space Models
Liquid Structural State-Space Models
Ramin Hasani
Mathias Lechner
Tsun-Hsuan Wang
Makram Chahine
Alexander Amini
Daniela Rus
AI4TS
107
95
0
26 Sep 2022
From One to Many: Dynamic Cross Attention Networks for LiDAR and Camera
  Fusion
From One to Many: Dynamic Cross Attention Networks for LiDAR and Camera Fusion
Rui Wan
Shuangjie Xu
Wei Wu
Xiaoyi Zou
Tongyi Cao
3DPC
20
4
0
25 Sep 2022
Hand Hygiene Assessment via Joint Step Segmentation and Key Action
  Scorer
Hand Hygiene Assessment via Joint Step Segmentation and Key Action Scorer
Chenglong Li
Qiwen Zhu
Tubiao Liu
Jin Tang
Yu Su
32
1
0
25 Sep 2022
Integrative Feature and Cost Aggregation with Transformers for Dense
  Correspondence
Integrative Feature and Cost Aggregation with Transformers for Dense Correspondence
Sunghwan Hong
Seokju Cho
Seung Wook Kim
Stephen Lin
3DV
42
4
0
19 Sep 2022
Quantum Vision Transformers
Quantum Vision Transformers
El Amine Cherrat
Iordanis Kerenidis
Natansh Mathur
Jonas Landman
M. Strahm
Yun. Y Li
ViT
34
55
0
16 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
30
109
0
31 Aug 2022
A Circular Window-based Cascade Transformer for Online Action Detection
A Circular Window-based Cascade Transformer for Online Action Detection
Shuyuan Cao
Weihua Luo
Bairui Wang
Wei Emma Zhang
Lin Ma
42
6
0
30 Aug 2022
Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End
  Speech Recognition
Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition
A. Andrusenko
R. Nasretdinov
A. Romanenko
20
18
0
16 Aug 2022
Controlling Perceived Emotion in Symbolic Music Generation with Monte
  Carlo Tree Search
Controlling Perceived Emotion in Symbolic Music Generation with Monte Carlo Tree Search
Lucas N. Ferreira
Lili Mou
Jim Whitehead
Levi H. S. Lelis
23
17
0
10 Aug 2022
SpanDrop: Simple and Effective Counterfactual Learning for Long
  Sequences
SpanDrop: Simple and Effective Counterfactual Learning for Long Sequences
Peng Qi
Guangtao Wang
Jing Huang
24
0
0
03 Aug 2022
Momentum Transformer: Closing the Performance Gap Between Self-attention
  and Its Linearization
Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization
T. Nguyen
Richard G. Baraniuk
Robert M. Kirby
Stanley J. Osher
Bao Wang
32
9
0
01 Aug 2022
Neural Architecture Search on Efficient Transformers and Beyond
Neural Architecture Search on Efficient Transformers and Beyond
Zexiang Liu
Dong Li
Kaiyue Lu
Zhen Qin
Weixuan Sun
Jiacheng Xu
Yiran Zhong
35
19
0
28 Jul 2022
3D Siamese Transformer Network for Single Object Tracking on Point
  Clouds
3D Siamese Transformer Network for Single Object Tracking on Point Clouds
Le Hui
Lingpeng Wang
Ling-Yu Tang
Kaihao Lan
Jin Xie
Jian Yang
ViT
3DPC
31
59
0
25 Jul 2022
Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot
  Segmentation
Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation
Sunghwan Hong
Seokju Cho
Jisu Nam
Stephen Lin
Seung Wook Kim
ViT
24
122
0
22 Jul 2022
Eliminating Gradient Conflict in Reference-based Line-Art Colorization
Eliminating Gradient Conflict in Reference-based Line-Art Colorization
Zekun Li
Zhengyang Geng
Zhao Kang
Wenyu Chen
Yibo Yang
21
35
0
13 Jul 2022
Pure Transformers are Powerful Graph Learners
Pure Transformers are Powerful Graph Learners
Jinwoo Kim
Tien Dat Nguyen
Seonwoo Min
Sungjun Cho
Moontae Lee
Honglak Lee
Seunghoon Hong
43
189
0
06 Jul 2022
CTrGAN: Cycle Transformers GAN for Gait Transfer
CTrGAN: Cycle Transformers GAN for Gait Transfer
Shahar Mahpod
Noam Gaash
Hay Hoffman
Gil Ben-Artzi
ViT
28
1
0
30 Jun 2022
Deformable Graph Transformer
Deformable Graph Transformer
Jinyoung Park
Seongjun Yun
Hyeon-ju Park
Jaewoo Kang
Jisu Jeong
KyungHyun Kim
Jung-Woo Ha
Hyunwoo J. Kim
90
7
0
29 Jun 2022
Long Range Language Modeling via Gated State Spaces
Long Range Language Modeling via Gated State Spaces
Harsh Mehta
Ankit Gupta
Ashok Cutkosky
Behnam Neyshabur
Mamba
37
231
0
27 Jun 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary
  Algorithm
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
34
32
0
19 Jun 2022
SimA: Simple Softmax-free Attention for Vision Transformers
SimA: Simple Softmax-free Attention for Vision Transformers
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
21
25
0
17 Jun 2022
Online Segmentation of LiDAR Sequences: Dataset and Algorithm
Online Segmentation of LiDAR Sequences: Dataset and Algorithm
Romain Loiseau
Mathieu Aubry
Loïc Landrieu
3DPC
24
15
0
16 Jun 2022
Recurrent Transformer Variational Autoencoders for Multi-Action Motion
  Synthesis
Recurrent Transformer Variational Autoencoders for Multi-Action Motion Synthesis
Rania Briq
Chuhang Zou
L. Pishchulin
Christopher Broaddus
Juergen Gall
24
1
0
14 Jun 2022
Neural Differential Equations for Learning to Program Neural Nets
  Through Continuous Learning Rules
Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules
Kazuki Irie
Francesco Faccio
Jürgen Schmidhuber
AI4TS
35
11
0
03 Jun 2022
AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
Kun Song
Heyang Xue
Xinsheng Wang
Jian Cong
Yongmao Zhang
Linfu Xie
Bing Yang
Xiong Zhang
Dan Su
19
5
0
01 Jun 2022
Chefs' Random Tables: Non-Trigonometric Random Features
Chefs' Random Tables: Non-Trigonometric Random Features
Valerii Likhosherstov
K. Choromanski
Kumar Avinava Dubey
Frederick Liu
Tamás Sarlós
Adrian Weller
33
17
0
30 May 2022
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing
  Mechanisms in Sequence Learning
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
Aniket Didolkar
Kshitij Gupta
Anirudh Goyal
Nitesh B. Gundavarapu
Alex Lamb
Nan Rosemary Ke
Yoshua Bengio
AI4CE
118
17
0
30 May 2022
COFS: Controllable Furniture layout Synthesis
COFS: Controllable Furniture layout Synthesis
W. Para
Paul Guerrero
Niloy Mitra
Peter Wonka
3DV
42
16
0
29 May 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
78
2,024
0
27 May 2022
Training Language Models with Memory Augmentation
Training Language Models with Memory Augmentation
Zexuan Zhong
Tao Lei
Danqi Chen
RALM
239
128
0
25 May 2022
OnePose: One-Shot Object Pose Estimation without CAD Models
OnePose: One-Shot Object Pose Estimation without CAD Models
Jiaming Sun
Zihao Wang
Siyu Zhang
Xingyi He He
Hongcheng Zhao
Guofeng Zhang
Xiaowei Zhou
98
148
0
24 May 2022
KERPLE: Kernelized Relative Positional Embedding for Length
  Extrapolation
KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation
Ta-Chung Chi
Ting-Han Fan
Peter J. Ramadge
Alexander I. Rudnicky
44
65
0
20 May 2022
FvOR: Robust Joint Shape and Pose Optimization for Few-view Object
  Reconstruction
FvOR: Robust Joint Shape and Pose Optimization for Few-view Object Reconstruction
Zhenpei Yang
Zhile Ren
Miguel Angel Bautista
Zaiwei Zhang
Qi Shan
Qi-Xing Huang
3DH
30
24
0
16 May 2022
Symphony Generation with Permutation Invariant Language Model
Symphony Generation with Permutation Invariant Language Model
Jiafeng Liu
Yuanliang Dong
Zehua Cheng
Xinran Zhang
Xiaobing Li
Feng Yu
Maosong Sun
21
39
0
10 May 2022
Sequencer: Deep LSTM for Image Classification
Sequencer: Deep LSTM for Image Classification
Yuki Tatsunami
Masato Taki
VLM
ViT
16
78
0
04 May 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
44
149
0
27 Apr 2022
Context-Aware Sequence Alignment using 4D Skeletal Augmentation
Context-Aware Sequence Alignment using 4D Skeletal Augmentation
Taein Kwon
Bugra Tekin
Siyu Tang
Marc Pollefeys
33
13
0
26 Apr 2022
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better
  than Dot-Product Self-Attention
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention
Tong Yu
Ruslan Khalitov
Lei Cheng
Zhirong Yang
MoE
27
10
0
22 Apr 2022
Efficient Linear Attention for Fast and Accurate Keypoint Matching
Efficient Linear Attention for Fast and Accurate Keypoint Matching
Suwichaya Suwanwimolkul
S. Komorita
3DPC
3DV
19
11
0
16 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
26
6
0
11 Apr 2022
Accelerating Attention through Gradient-Based Learned Runtime Pruning
Accelerating Attention through Gradient-Based Learned Runtime Pruning
Zheng Li
Soroush Ghodrati
Amir Yazdanbakhsh
H. Esmaeilzadeh
Mingu Kang
21
17
0
07 Apr 2022
Long Movie Clip Classification with State-Space Video Models
Long Movie Clip Classification with State-Space Video Models
Md. Mohaiminul Islam
Gedas Bertasius
VLM
43
102
0
04 Apr 2022
InstaFormer: Instance-Aware Image-to-Image Translation with Transformer
InstaFormer: Instance-Aware Image-to-Image Translation with Transformer
Soohyun Kim
Jongbeom Baek
Jihye Park
Gyeongnyeon Kim
Seung Wook Kim
ViT
39
47
0
30 Mar 2022
REGTR: End-to-end Point Cloud Correspondences with Transformers
REGTR: End-to-end Point Cloud Correspondences with Transformers
Zi Jian Yew
Gim Hee Lee
3DPC
ViT
35
172
0
28 Mar 2022
Previous
1234567
Next