Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.16236
Cited By
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
29 June 2020
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention"
50 / 346 papers shown
Title
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Josh Alman
Zhao-quan Song
0
0
0
17 May 2025
Graph Laplacian Wavelet Transformer via Learnable Spectral Decomposition
Andrew Kiruluta
Eric Lundy
Priscilla Burity
29
0
0
09 May 2025
OWT: A Foundational Organ-Wise Tokenization Framework for Medical Imaging
Sifan Song
Siyeop Yoon
Pengfei Jin
Sekeun Kim
Matthew Tivnan
...
Zhiliang Lyu
Dufan Wu
Ning Guo
Xiang Li
Quanzheng Li
OOD
ViT
64
0
0
08 May 2025
T-T: Table Transformer for Tagging-based Aspect Sentiment Triplet Extraction
Kun Peng
Chaodong Tong
Cong Cao
Hao Peng
Yue Liu
Guanlin Wu
Lei Jiang
Yanbing Liu
Philip S. Yu
LMTD
48
0
0
08 May 2025
Generative Models for Long Time Series: Approximately Equivariant Recurrent Network Structures for an Adjusted Training Scheme
Ruwen Fulek
Markus Lange-Hegermann
AI4TS
40
0
0
08 May 2025
Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution
Xingyu Zhou
Wei Long
Jingbo Lu
Shiyin Jiang
Weiyi You
Haifeng Wu
Shuhang Gu
41
0
0
04 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
99
1
0
01 May 2025
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
Muyi Bao
Shuchang Lyu
Zhaoyang Xu
Huiyu Zhou
Jinchang Ren
Shiming Xiang
Xiaomeng Li
Guangliang Cheng
Mamba
87
0
0
01 May 2025
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
Kaipeng Zhang
Lizhuang Ma
Yufei Guo
Jun Wang
Wenqi Zhang
MQ
57
0
0
01 May 2025
RWKV-X: A Linear Complexity Hybrid Language Model
Haowen Hou
Zhiyi Huang
Kaifeng Tan
Rongchang Lu
Fei Richard Yu
VLM
83
0
0
30 Apr 2025
From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models
Andrew Kiruluta
29
0
0
29 Apr 2025
Revisiting Reset Mechanisms in Spiking Neural Networks for Sequential Modeling: Specialized Discretization for Binary Activated RNN
Enqi Zhang
MQ
149
0
0
24 Apr 2025
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Xu Ma
Peize Sun
Haoyu Ma
Hao Tang
Chih-Yao Ma
...
Matt Feiszli
Peizhao Zhang
Peter Vajda
Sam S. Tsai
Y. Fu
70
1
0
24 Apr 2025
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
Aviv Bick
Eric P. Xing
Albert Gu
RALM
91
0
0
22 Apr 2025
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation
Lvmin Zhang
Maneesh Agrawala
DiffM
VGen
75
0
0
17 Apr 2025
Hadamard product in deep learning: Introduction, Advances and Challenges
Grigorios G. Chrysos
Yongtao Wu
Razvan Pascanu
Philip Torr
V. Cevher
AAML
98
0
0
17 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Kaipeng Zhang
MGen
VGen
70
1
0
01 Apr 2025
From S4 to Mamba: A Comprehensive Survey on Structured State Space Models
Shriyank Somvanshi
Md Monzurul Islam
Mahmuda Sultana Mimi
Sazzad Bin Bashar Polock
Gaurab Chhetri
Subasish Das
Mamba
AI4TS
45
0
0
22 Mar 2025
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
M. Beck
Korbinian Poppel
Phillip Lippe
Sepp Hochreiter
66
1
0
18 Mar 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu-Xi Cheng
MoE
103
2
0
07 Mar 2025
Conformal Transformations for Symmetric Power Transformers
Saurabh Kumar
Jacob Buckman
Carles Gelada
Sean Zhang
70
0
0
05 Mar 2025
Predicting Team Performance from Communications in Simulated Search-and-Rescue
Ali Jalal-Kamali
Nikolos Gurney
David Pynadath
AI4TS
116
8
0
05 Mar 2025
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar
Gursimran Singh
Mohammad Akbari
Yong Zhang
VLM
77
0
0
04 Mar 2025
SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures
Hui Liu
Chen Jia
Fan Shi
Xu Cheng
Shengyong Chen
Mamba
47
0
0
03 Mar 2025
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Disen Lan
Weigao Sun
Jiaxi Hu
Jusen Du
Yu-Xi Cheng
69
0
0
03 Mar 2025
Transformer Meets Twicing: Harnessing Unattended Residual Information
Laziz U. Abdullaev
Tan M. Nguyen
41
2
0
02 Mar 2025
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
Thomas Schmied
Thomas Adler
Vihang Patil
M. Beck
Korbinian Poppel
Johannes Brandstetter
G. Klambauer
Razvan Pascanu
Sepp Hochreiter
75
5
0
21 Feb 2025
A Survey of Model Architectures in Information Retrieval
Zhichao Xu
Fengran Mo
Zhiqi Huang
Crystina Zhang
Puxuan Yu
Bei Wang
Jimmy J. Lin
Vivek Srikumar
KELM
3DV
58
2
0
21 Feb 2025
MoM: Linear Sequence Modeling with Mixture-of-Memories
Jusen Du
Weigao Sun
Disen Lan
Jiaxi Hu
Yu-Xi Cheng
KELM
75
3
0
19 Feb 2025
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Cheng Luo
Zefan Cai
Hanshi Sun
Jinqi Xiao
Bo Yuan
Wen Xiao
Junjie Hu
Jiawei Zhao
Beidi Chen
Anima Anandkumar
69
1
0
18 Feb 2025
Associative Recurrent Memory Transformer
Ivan Rodkin
Yuri Kuratov
Aydar Bulatov
Andrey Kravchenko
68
2
0
17 Feb 2025
Twilight: Adaptive Attention Sparsity with Hierarchical Top-
p
p
p
Pruning
C. Lin
Jiaming Tang
Shuo Yang
Hanshuo Wang
Tian Tang
Boyu Tian
Ion Stoica
Enze Xie
Mingyu Gao
97
2
0
04 Feb 2025
Explaining Context Length Scaling and Bounds for Language Models
Jingzhe Shi
Qinwei Ma
Hongyi Liu
Hang Zhao
Jeng-Neng Hwang
Jenq-Neng Hwang
LRM
81
2
0
03 Feb 2025
Context-Aware Hierarchical Merging for Long Document Summarization
Litu Ou
Mirella Lapata
MoMe
200
1
0
03 Feb 2025
Generalization Error Analysis for Selective State-Space Models Through the Lens of Attention
Arya Honarpisheh
Mustafa Bozdag
Octavia Camps
Mario Sznaier
Mamba
75
1
0
03 Feb 2025
Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models
J. P. Muñoz
Jinjie Yuan
Nilesh Jain
Mamba
72
1
0
28 Jan 2025
State-space models are accurate and efficient neural operators for dynamical systems
Zheyuan Hu
Nazanin Ahmadi Daryakenari
Qianli Shen
Kenji Kawaguchi
George Karniadakis
Mamba
AI4CE
72
11
0
28 Jan 2025
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
Weikang Meng
Yadan Luo
Xin Li
D. Jiang
Zheng Zhang
159
0
0
25 Jan 2025
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Hongjun Wang
Wonmin Byeon
Jiarui Xu
Liang Feng
Ka Chun Cheung
Xiaolong Wang
Kai Han
Jan Kautz
Sifei Liu
152
0
0
21 Jan 2025
ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models
Thibaut Thonet
Jos Rozen
Laurent Besacier
RALM
137
2
0
20 Jan 2025
Generative Retrieval for Book search
Yubao Tang
Ruqing Zhang
J. Guo
Maarten de Rijke
Shihao Liu
S. Wang
Dawei Yin
Xueqi Cheng
RALM
31
0
0
19 Jan 2025
Towards Scalable and Stable Parallelization of Nonlinear RNNs
Xavier Gonzalez
Andrew Warrington
Jimmy T.H. Smith
Scott W. Linderman
93
8
0
17 Jan 2025
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
77
9
0
11 Jan 2025
Key-value memory in the brain
Samuel J. Gershman
Ila Fiete
Kazuki Irie
34
7
0
06 Jan 2025
A Separable Self-attention Inspired by the State Space Model for Computer Vision
Juntao Zhang
Shaogeng Liu
Kun Bian
You Zhou
Pei Zhang
Jianning Liu
Jun Zhou
Bingyan Liu
Mamba
55
0
0
03 Jan 2025
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
152
612
0
31 Dec 2024
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Yunxiang Fu
Meng Lou
Yizhou Yu
115
1
0
16 Dec 2024
Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHRs
Michael Wornow
Suhana Bedi
Miguel Angel Fuentes Hernandez
E. Steinberg
Jason Alan Fries
Christopher Ré
Sanmi Koyejo
N. Shah
95
4
0
09 Dec 2024
MambaIRv2: Attentive State Space Restoration
Hang Guo
Yong Guo
Yaohua Zha
Yulun Zhang
W. J. Li
Tao Dai
Shu-Tao Xia
Yawei Li
Mamba
122
12
0
22 Nov 2024
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
Sanghyeok Lee
Joonmyung Choi
Hyunwoo J. Kim
115
3
0
22 Nov 2024
1
2
3
4
5
6
7
Next