ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.16236
  4. Cited By
Transformers are RNNs: Fast Autoregressive Transformers with Linear
  Attention

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

29 June 2020
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
ArXivPDFHTML

Papers citing "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention"

50 / 346 papers shown
Title
Understanding the differences in Foundation Models: Attention, State
  Space Models, and Recurrent Neural Networks
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks
Jerome Sieber
Carmen Amo Alonso
A. Didier
M. Zeilinger
Antonio Orvieto
AAML
50
8
0
24 May 2024
Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection
Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection
Jia Guo
Shuai Lu
Weihang Zhang
Huiqi Li
Huiqi Li
Hongen Liao
ViT
69
8
0
23 May 2024
Retrievable Domain-Sensitive Feature Memory for Multi-Domain Recommendation
Retrievable Domain-Sensitive Feature Memory for Multi-Domain Recommendation
Yuang Zhao
Zhaocheng Du
Qinglin Jia
Linxuan Zhang
Zhenhua Dong
Ruiming Tang
38
2
0
21 May 2024
Asymptotic theory of in-context learning by linear attention
Asymptotic theory of in-context learning by linear attention
Yue M. Lu
Mary I. Letey
Jacob A. Zavatone-Veth
Anindita Maiti
Cengiz Pehlevan
29
10
0
20 May 2024
SLAB: Efficient Transformers with Simplified Linear Attention and
  Progressive Re-parameterized Batch Normalization
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Jialong Guo
Xinghao Chen
Yehui Tang
Yunhe Wang
ViT
49
9
0
19 May 2024
Memory Mosaics
Memory Mosaics
Jianyu Zhang
Niklas Nolte
Ranajoy Sadhukhan
Beidi Chen
Léon Bottou
VLM
73
3
0
10 May 2024
Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models
Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models
Zeyu Yang
Peikun Guo
Khadija Zanna
Akane Sano
Xiaoxue Yang
Akane Sano
DiffM
34
8
0
12 Apr 2024
On the Theoretical Expressive Power and the Design Space of Higher-Order
  Graph Transformers
On the Theoretical Expressive Power and the Design Space of Higher-Order Graph Transformers
Cai Zhou
Rose Yu
Yusu Wang
41
7
0
04 Apr 2024
GP-MoLFormer: A Foundation Model For Molecular Generation
GP-MoLFormer: A Foundation Model For Molecular Generation
Jerret Ross
Brian M. Belgodere
Samuel C. Hoffman
Vijil Chenthamarakshan
Youssef Mroueh
Payel Das
Payel Das
38
5
0
04 Apr 2024
Linear Attention Sequence Parallelism
Linear Attention Sequence Parallelism
Weigao Sun
Zhen Qin
Dong Li
Xuyang Shen
Yu Qiao
Yiran Zhong
73
2
0
03 Apr 2024
DE-HNN: An effective neural model for Circuit Netlist representation
DE-HNN: An effective neural model for Circuit Netlist representation
Zhishang Luo
Truong-Son Hy
Puoya Tabaghi
Donghyeon Koh
Michael Defferrard
Elahe Rezaei
Ryan Carey
William Rhett Davis
Rajeev Jain
Yusu Wang
16
5
0
30 Mar 2024
DiJiang: Efficient Large Language Models through Compact Kernelization
DiJiang: Efficient Large Language Models through Compact Kernelization
Hanting Chen
Zhicheng Liu
Xutao Wang
Yuchuan Tian
Yunhe Wang
VLM
31
5
0
29 Mar 2024
Physics-Informed Diffusion Models
Physics-Informed Diffusion Models
Jan-Hendrik Bastek
WaiChing Sun
D. Kochmann
DiffM
AI4CE
47
11
0
21 Mar 2024
Unifying Feature and Cost Aggregation with Transformers for Semantic and
  Visual Correspondence
Unifying Feature and Cost Aggregation with Transformers for Semantic and Visual Correspondence
Sung‐Jin Hong
Seokju Cho
Seungryong Kim
Stephen Lin
ViT
51
5
0
17 Mar 2024
Introducing Routing Functions to Vision-Language Parameter-Efficient
  Fine-Tuning with Low-Rank Bottlenecks
Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks
Tingyu Qu
Tinne Tuytelaars
Marie-Francine Moens
MoE
46
2
0
14 Mar 2024
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Mahdi Karami
Ali Ghodsi
VLM
48
6
0
28 Feb 2024
Linear Transformers are Versatile In-Context Learners
Linear Transformers are Versatile In-Context Learners
Max Vladymyrov
J. Oswald
Mark Sandler
Rong Ge
34
13
0
21 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
39
1
0
01 Feb 2024
Delving Deeper Into Astromorphic Transformers
Delving Deeper Into Astromorphic Transformers
Md. Zesun Ahmed Mia
Malyaban Bal
Abhronil Sengupta
36
1
0
18 Dec 2023
CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer
CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer
Sicheng Wang
Hao Jiang
Lei Xiang
ViT
31
2
0
14 Dec 2023
Graph Convolutions Enrich the Self-Attention in Transformers!
Graph Convolutions Enrich the Self-Attention in Transformers!
Jeongwhan Choi
Hyowon Wi
Jayoung Kim
Yehjin Shin
Kookjin Lee
Nathaniel Trask
Noseong Park
32
4
0
07 Dec 2023
TD-Net: A Tri-domain network for sparse-view CT reconstruction
TD-Net: A Tri-domain network for sparse-view CT reconstruction
Xinyuan Wang
Changqing Su
Bo Xiong
MedIm
14
0
0
26 Nov 2023
Long-MIL: Scaling Long Contextual Multiple Instance Learning for
  Histopathology Whole Slide Image Analysis
Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Chenglu Zhu
Jiatong Cai
Sunyi Zheng
Lin Yang
VLM
35
4
0
21 Nov 2023
DISTA: Denoising Spiking Transformer with intrinsic plasticity and
  spatiotemporal attention
DISTA: Denoising Spiking Transformer with intrinsic plasticity and spatiotemporal attention
Boxun Xu
Hejia Geng
Yuxuan Yin
Peng Li
23
2
0
15 Nov 2023
The Expressibility of Polynomial based Attention Scheme
The Expressibility of Polynomial based Attention Scheme
Zhao-quan Song
Guangyi Xu
Junze Yin
32
5
0
30 Oct 2023
miditok: A Python package for MIDI file tokenization
miditok: A Python package for MIDI file tokenization
Nathan Fradet
Jean-Pierre Briot
F. Chhel
A. E. Seghrouchni
Nicolas Gutowski
32
39
0
26 Oct 2023
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
Yulong Shi
Mingwei Sun
Yongshuai Wang
Hui Sun
Zengqiang Chen
34
4
0
10 Oct 2023
Benchmarking and In-depth Performance Study of Large Language Models on
  Habana Gaudi Processors
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors
Chengming Zhang
Baixi Sun
Xiaodong Yu
Zhen Xie
Weijian Zheng
K. Iskra
Pete Beckman
Dingwen Tao
17
4
0
29 Sep 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
34
15
0
28 Sep 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
31
15
0
28 Sep 2023
IFT: Image Fusion Transformer for Ghost-free High Dynamic Range Imaging
IFT: Image Fusion Transformer for Ghost-free High Dynamic Range Imaging
Hai-lin Wang
Wei Li
Yuanyuan Xi
Jie Hu
Hanting Chen
Longyu Li
Yun Wang
14
1
0
26 Sep 2023
Exploring RWKV for Memory Efficient and Low Latency Streaming ASR
Exploring RWKV for Memory Efficient and Low Latency Streaming ASR
Keyu An
Shiliang Zhang
31
4
0
26 Sep 2023
Large-scale Pretraining Improves Sample Efficiency of Active Learning
  based Molecule Virtual Screening
Large-scale Pretraining Improves Sample Efficiency of Active Learning based Molecule Virtual Screening
Zhonglin Cao
Simone Sciabola
Ye Wang
35
1
0
20 Sep 2023
Complexity Scaling for Speech Denoising
Complexity Scaling for Speech Denoising
Hangting Chen
Jianwei Yu
Chao Weng
27
2
0
14 Sep 2023
Auto-Regressive Next-Token Predictors are Universal Learners
Auto-Regressive Next-Token Predictors are Universal Learners
Eran Malach
LRM
24
36
0
13 Sep 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
42
3
0
18 Aug 2023
Retentive Network: A Successor to Transformer for Large Language Models
Retentive Network: A Successor to Transformer for Large Language Models
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
LRM
78
301
0
17 Jul 2023
Transformers in Reinforcement Learning: A Survey
Transformers in Reinforcement Learning: A Survey
Pranav Agarwal
A. Rahman
P. St-Charles
Simon J. D. Prince
Samira Ebrahimi Kahou
OffRL
24
18
0
12 Jul 2023
LongNet: Scaling Transformers to 1,000,000,000 Tokens
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Jiayu Ding
Shuming Ma
Li Dong
Xingxing Zhang
Shaohan Huang
Wenhui Wang
Nanning Zheng
Furu Wei
CLL
41
151
0
05 Jul 2023
Spike-driven Transformer
Spike-driven Transformer
Man Yao
Jiakui Hu
Zhaokun Zhou
Liuliang Yuan
Yonghong Tian
Boxing Xu
Guoqi Li
34
114
0
04 Jul 2023
Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural
  Network
Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural Network
Zizhuo Li
Jiayi Ma
29
2
0
04 Jul 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph
  Reading
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Yujia Xiao
Shaofei Zhang
Xi Wang
Xuejiao Tan
Lei He
Sheng Zhao
Frank Soong
Tan Lee
25
5
0
03 Jul 2023
Auto-Spikformer: Spikformer Architecture Search
Auto-Spikformer: Spikformer Architecture Search
Kaiwei Che
Zhaokun Zhou
Zhengyu Ma
Wei Fang
Yanqing Chen
Shuaijie Shen
Liuliang Yuan
Yonghong Tian
29
4
0
01 Jun 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive
  Transformers
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
39
53
0
25 May 2023
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Ziwei He
Meng-Da Yang
Minwei Feng
Jingcheng Yin
Xiang Wang
Jingwen Leng
Zhouhan Lin
ViT
35
11
0
24 May 2023
RWKV: Reinventing RNNs for the Transformer Era
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
90
557
0
22 May 2023
Farewell to Aimless Large-scale Pretraining: Influential Subset
  Selection for Language Model
Farewell to Aimless Large-scale Pretraining: Influential Subset Selection for Language Model
Xiao Wang
Wei Zhou
Qi Zhang
Jie Zhou
Songyang Gao
Junzhe Wang
Menghan Zhang
Xiang Gao
Yunwen Chen
Tao Gui
43
7
0
22 May 2023
FIT: Far-reaching Interleaved Transformers
FIT: Far-reaching Interleaved Transformers
Ting-Li Chen
Lala Li
29
12
0
22 May 2023
TAPIR: Learning Adaptive Revision for Incremental Natural Language
  Understanding with a Two-Pass Model
TAPIR: Learning Adaptive Revision for Incremental Natural Language Understanding with a Two-Pass Model
Patrick Kahardipraja
Brielen Madureira
David Schlangen
CLL
34
9
0
18 May 2023
SKI to go Faster: Accelerating Toeplitz Neural Networks via Asymmetric
  Kernels
SKI to go Faster: Accelerating Toeplitz Neural Networks via Asymmetric Kernels
Alexander Moreno
Jonathan Mei
Luke Walters
21
0
0
15 May 2023
Previous
1234567
Next