Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.16236
Cited By
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
29 June 2020
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention"
50 / 346 papers shown
Title
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks
Jerome Sieber
Carmen Amo Alonso
A. Didier
M. Zeilinger
Antonio Orvieto
AAML
50
8
0
24 May 2024
Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection
Jia Guo
Shuai Lu
Weihang Zhang
Huiqi Li
Huiqi Li
Hongen Liao
ViT
69
8
0
23 May 2024
Retrievable Domain-Sensitive Feature Memory for Multi-Domain Recommendation
Yuang Zhao
Zhaocheng Du
Qinglin Jia
Linxuan Zhang
Zhenhua Dong
Ruiming Tang
38
2
0
21 May 2024
Asymptotic theory of in-context learning by linear attention
Yue M. Lu
Mary I. Letey
Jacob A. Zavatone-Veth
Anindita Maiti
Cengiz Pehlevan
29
10
0
20 May 2024
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Jialong Guo
Xinghao Chen
Yehui Tang
Yunhe Wang
ViT
49
9
0
19 May 2024
Memory Mosaics
Jianyu Zhang
Niklas Nolte
Ranajoy Sadhukhan
Beidi Chen
Léon Bottou
VLM
73
3
0
10 May 2024
Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models
Zeyu Yang
Peikun Guo
Khadija Zanna
Akane Sano
Xiaoxue Yang
Akane Sano
DiffM
34
8
0
12 Apr 2024
On the Theoretical Expressive Power and the Design Space of Higher-Order Graph Transformers
Cai Zhou
Rose Yu
Yusu Wang
41
7
0
04 Apr 2024
GP-MoLFormer: A Foundation Model For Molecular Generation
Jerret Ross
Brian M. Belgodere
Samuel C. Hoffman
Vijil Chenthamarakshan
Youssef Mroueh
Payel Das
Payel Das
38
5
0
04 Apr 2024
Linear Attention Sequence Parallelism
Weigao Sun
Zhen Qin
Dong Li
Xuyang Shen
Yu Qiao
Yiran Zhong
73
2
0
03 Apr 2024
DE-HNN: An effective neural model for Circuit Netlist representation
Zhishang Luo
Truong-Son Hy
Puoya Tabaghi
Donghyeon Koh
Michael Defferrard
Elahe Rezaei
Ryan Carey
William Rhett Davis
Rajeev Jain
Yusu Wang
16
5
0
30 Mar 2024
DiJiang: Efficient Large Language Models through Compact Kernelization
Hanting Chen
Zhicheng Liu
Xutao Wang
Yuchuan Tian
Yunhe Wang
VLM
31
5
0
29 Mar 2024
Physics-Informed Diffusion Models
Jan-Hendrik Bastek
WaiChing Sun
D. Kochmann
DiffM
AI4CE
47
11
0
21 Mar 2024
Unifying Feature and Cost Aggregation with Transformers for Semantic and Visual Correspondence
Sung‐Jin Hong
Seokju Cho
Seungryong Kim
Stephen Lin
ViT
51
5
0
17 Mar 2024
Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks
Tingyu Qu
Tinne Tuytelaars
Marie-Francine Moens
MoE
46
2
0
14 Mar 2024
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Mahdi Karami
Ali Ghodsi
VLM
48
6
0
28 Feb 2024
Linear Transformers are Versatile In-Context Learners
Max Vladymyrov
J. Oswald
Mark Sandler
Rong Ge
34
13
0
21 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
39
1
0
01 Feb 2024
Delving Deeper Into Astromorphic Transformers
Md. Zesun Ahmed Mia
Malyaban Bal
Abhronil Sengupta
36
1
0
18 Dec 2023
CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer
Sicheng Wang
Hao Jiang
Lei Xiang
ViT
31
2
0
14 Dec 2023
Graph Convolutions Enrich the Self-Attention in Transformers!
Jeongwhan Choi
Hyowon Wi
Jayoung Kim
Yehjin Shin
Kookjin Lee
Nathaniel Trask
Noseong Park
32
4
0
07 Dec 2023
TD-Net: A Tri-domain network for sparse-view CT reconstruction
Xinyuan Wang
Changqing Su
Bo Xiong
MedIm
14
0
0
26 Nov 2023
Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Chenglu Zhu
Jiatong Cai
Sunyi Zheng
Lin Yang
VLM
35
4
0
21 Nov 2023
DISTA: Denoising Spiking Transformer with intrinsic plasticity and spatiotemporal attention
Boxun Xu
Hejia Geng
Yuxuan Yin
Peng Li
23
2
0
15 Nov 2023
The Expressibility of Polynomial based Attention Scheme
Zhao-quan Song
Guangyi Xu
Junze Yin
32
5
0
30 Oct 2023
miditok: A Python package for MIDI file tokenization
Nathan Fradet
Jean-Pierre Briot
F. Chhel
A. E. Seghrouchni
Nicolas Gutowski
32
39
0
26 Oct 2023
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
Yulong Shi
Mingwei Sun
Yongshuai Wang
Hui Sun
Zengqiang Chen
34
4
0
10 Oct 2023
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors
Chengming Zhang
Baixi Sun
Xiaodong Yu
Zhen Xie
Weijian Zheng
K. Iskra
Pete Beckman
Dingwen Tao
17
4
0
29 Sep 2023
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
34
15
0
28 Sep 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
31
15
0
28 Sep 2023
IFT: Image Fusion Transformer for Ghost-free High Dynamic Range Imaging
Hai-lin Wang
Wei Li
Yuanyuan Xi
Jie Hu
Hanting Chen
Longyu Li
Yun Wang
14
1
0
26 Sep 2023
Exploring RWKV for Memory Efficient and Low Latency Streaming ASR
Keyu An
Shiliang Zhang
31
4
0
26 Sep 2023
Large-scale Pretraining Improves Sample Efficiency of Active Learning based Molecule Virtual Screening
Zhonglin Cao
Simone Sciabola
Ye Wang
35
1
0
20 Sep 2023
Complexity Scaling for Speech Denoising
Hangting Chen
Jianwei Yu
Chao Weng
27
2
0
14 Sep 2023
Auto-Regressive Next-Token Predictors are Universal Learners
Eran Malach
LRM
24
36
0
13 Sep 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
42
3
0
18 Aug 2023
Retentive Network: A Successor to Transformer for Large Language Models
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
LRM
78
301
0
17 Jul 2023
Transformers in Reinforcement Learning: A Survey
Pranav Agarwal
A. Rahman
P. St-Charles
Simon J. D. Prince
Samira Ebrahimi Kahou
OffRL
24
18
0
12 Jul 2023
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Jiayu Ding
Shuming Ma
Li Dong
Xingxing Zhang
Shaohan Huang
Wenhui Wang
Nanning Zheng
Furu Wei
CLL
41
151
0
05 Jul 2023
Spike-driven Transformer
Man Yao
Jiakui Hu
Zhaokun Zhou
Liuliang Yuan
Yonghong Tian
Boxing Xu
Guoqi Li
34
114
0
04 Jul 2023
Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural Network
Zizhuo Li
Jiayi Ma
29
2
0
04 Jul 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Yujia Xiao
Shaofei Zhang
Xi Wang
Xuejiao Tan
Lei He
Sheng Zhao
Frank Soong
Tan Lee
25
5
0
03 Jul 2023
Auto-Spikformer: Spikformer Architecture Search
Kaiwei Che
Zhaokun Zhou
Zhengyu Ma
Wei Fang
Yanqing Chen
Shuaijie Shen
Liuliang Yuan
Yonghong Tian
29
4
0
01 Jun 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
39
53
0
25 May 2023
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Ziwei He
Meng-Da Yang
Minwei Feng
Jingcheng Yin
Xiang Wang
Jingwen Leng
Zhouhan Lin
ViT
35
11
0
24 May 2023
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
90
557
0
22 May 2023
Farewell to Aimless Large-scale Pretraining: Influential Subset Selection for Language Model
Xiao Wang
Wei Zhou
Qi Zhang
Jie Zhou
Songyang Gao
Junzhe Wang
Menghan Zhang
Xiang Gao
Yunwen Chen
Tao Gui
43
7
0
22 May 2023
FIT: Far-reaching Interleaved Transformers
Ting-Li Chen
Lala Li
29
12
0
22 May 2023
TAPIR: Learning Adaptive Revision for Incremental Natural Language Understanding with a Two-Pass Model
Patrick Kahardipraja
Brielen Madureira
David Schlangen
CLL
34
9
0
18 May 2023
SKI to go Faster: Accelerating Toeplitz Neural Networks via Asymmetric Kernels
Alexander Moreno
Jonathan Mei
Luke Walters
21
0
0
15 May 2023
Previous
1
2
3
4
5
6
7
Next