Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.08621
Cited By
Retentive Network: A Successor to Transformer for Large Language Models
17 July 2023
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Retentive Network: A Successor to Transformer for Large Language Models"
50 / 208 papers shown
Title
Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Pingyi Chen
Zhongyi Shui
Chenglu Zhu
Lin Yang
MedIm
41
4
0
18 Oct 2024
Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges
Clayton Frederick Souza Leite
Henry Mauranen
Aziza Zhanabatyrova
Yu Xiao
24
1
0
17 Oct 2024
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Yizhao Gao
Zhichen Zeng
Dayou Du
Shijie Cao
Hayden Kwok-Hay So
...
Junjie Lai
Mao Yang
Ting Cao
Fan Yang
M. Yang
52
19
0
17 Oct 2024
On Divergence Measures for Training GFlowNets
Tiago da Silva
Eliezer de Souza da Silva
Diego Mesquita
BDL
29
1
0
12 Oct 2024
Efficiently Scanning and Resampling Spatio-Temporal Tasks with Irregular Observations
Bryce Ferenczi
Michael G. Burke
Tom Drummond
31
0
0
11 Oct 2024
Parameter-Efficient Fine-Tuning of State Space Models
Kevin Galim
Wonjun Kang
Yuchen Zeng
H. Koo
Kangwook Lee
29
4
0
11 Oct 2024
Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling
Yingfa Chen
Xinrong Zhang
Shengding Hu
Xu Han
Zhiyuan Liu
Maosong Sun
Mamba
59
2
0
09 Oct 2024
MatMamba: A Matryoshka State Space Model
Abhinav Shukla
Sai H. Vemprala
Aditya Kusupati
Ashish Kapoor
Mamba
28
0
0
09 Oct 2024
Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures
Junxuan Wang
Xuyang Ge
Wentao Shu
Qiong Tang
Yunhua Zhou
Zhengfu He
Xipeng Qiu
29
7
0
09 Oct 2024
LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction
Di Liang
Xiaofei Li
24
0
0
09 Oct 2024
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Zhihao He
Hang Yu
Zi Gong
Shizhan Liu
J. Li
Weiyao Lin
VLM
38
1
0
09 Oct 2024
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Mutian He
Philip N. Garner
82
0
0
09 Oct 2024
Correlation-Aware Select and Merge Attention for Efficient Fine-Tuning and Context Length Extension
Ning Wang
Zekun Li
Tongxin Bai
Guoqi Li
27
0
0
05 Oct 2024
RetCompletion:High-Speed Inference Image Completion with Retentive Network
Yueyang Cang
P. Hu
Xiaoteng Zhang
Xingtong Wang
Yuhang Liu
VLM
31
0
0
05 Oct 2024
Can Mamba Always Enjoy the "Free Lunch"?
Ruifeng Ren
Zhicong Li
Yong Liu
44
1
0
04 Oct 2024
How to Train Long-Context Language Models (Effectively)
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
72
38
0
03 Oct 2024
Were RNNs All We Needed?
Leo Feng
Frederick Tung
Mohamed Osama Ahmed
Yoshua Bengio
Hossein Hajimirsadegh
AI4TS
29
14
1
02 Oct 2024
On the Power of Decision Trees in Auto-Regressive Language Modeling
Yulu Gan
Tomer Galanti
T. Poggio
Eran Malach
AI4CE
13
0
0
27 Sep 2024
Towards LifeSpan Cognitive Systems
Yu Wang
Chi Han
Tongtong Wu
Xiaoxin He
Wangchunshu Zhou
...
Zexue He
Wei Wang
Gholamreza Haffari
Heng Ji
Julian McAuley
KELM
CLL
144
1
0
20 Sep 2024
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Yu Zhang
Songlin Yang
Ruijie Zhu
Yue Zhang
Leyang Cui
...
Freda Shi
Bailin Wang
Wei Bi
P. Zhou
Guohong Fu
65
17
0
11 Sep 2024
The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
Shutong Niu
Ruoyu Wang
Jun Du
Gaobin Yang
Yanhui Tu
...
Tian Gao
Genshun Wan
Feng Ma
Jia Pan
Jianqing Gao
34
4
0
03 Sep 2024
Shifted Window Fourier Transform And Retention For Image Captioning
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
VLM
36
0
0
25 Aug 2024
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Aviv Bick
Kevin Y. Li
Eric P. Xing
J. Zico Kolter
Albert Gu
Mamba
53
24
0
19 Aug 2024
Fast Information Streaming Handler (FisH): A Unified Seismic Neural Network for Single Station Real-Time Earthquake Early Warning
Tianning Zhang
Feng Liu
Yuming Yuan
Rui Su
Wanli Ouyang
Lei Bai
26
0
0
13 Aug 2024
Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness
Xiaojing Fan
Chunliang Tao
AAML
39
28
0
08 Aug 2024
What comes after transformers? -- A selective survey connecting ideas in deep learning
Johannes Schneider
AI4CE
37
2
0
01 Aug 2024
Enhanced Structured State Space Models via Grouped FIR Filtering and Attention Sink Mechanisms
Yueran Zhang
Yating Yu
Lingtong Min
Mamba
23
0
0
01 Aug 2024
LION: Linear Group RNN for 3D Object Detection in Point Clouds
Zhe Liu
Jinghua Hou
Xinyu Wang
Xiaoqing Ye
Jingdong Wang
Hengshuang Zhao
Xiang Bai
3DPC
53
11
0
25 Jul 2024
Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption
Shi Luohe
Hongyi Zhang
Yao Yao
Z. Li
Zhao Hai
31
33
0
25 Jul 2024
Longhorn: State Space Models are Amortized Online Learners
Bo Liu
Rui Wang
Lemeng Wu
Yihao Feng
Peter Stone
Qian Liu
51
10
0
19 Jul 2024
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression
Daniel Goldstein
Fares Obeid
Eric Alcaide
Guangyu Song
Eugene Cheah
VLM
AI4TS
37
7
0
16 Jul 2024
Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers
Sukjun Hwang
Aakash Lahoti
Tri Dao
Albert Gu
Mamba
62
12
0
13 Jul 2024
ST-RetNet: A Long-term Spatial-Temporal Traffic Flow Prediction Method
Baichao Long
Wang Zhu
Jianli Xiao
GNN
AI4TS
23
1
0
13 Jul 2024
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Jay Shah
Ganesh Bikshandi
Ying Zhang
Vijay Thakkar
Pradeep Ramani
Tri Dao
50
113
0
11 Jul 2024
Spatial-Temporal Attention Model for Traffic State Estimation with Sparse Internet of Vehicles
Jianzhe Xue
Dongcheng Yuan
Yu Sun
Tianqi Zhang
Wenchao Xu
Haibo Zhou
Xuemin
Shen
32
1
0
10 Jul 2024
Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning
Xiaojie Li
Yibo Yang
Jianlong Wu
Guohao Li
Liqiang Nie
Min Zhang
Mamba
41
5
0
08 Jul 2024
Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition
Bangbang Zhou
Yadong Qu
Zixiao Wang
Zicheng Li
Boqiang Zhang
Hongtao Xie
47
1
0
08 Jul 2024
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Huiqiang Jiang
Yucheng Li
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Chin-Yew Lin
Yuqing Yang
L. Qiu
72
83
0
02 Jul 2024
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Jiayi Yuan
Hongyi Liu
Shaochen
Zhong
Yu-Neng Chuang
...
Hongye Jin
V. Chaudhary
Zhaozhuo Xu
Zirui Liu
Xia Hu
40
17
0
01 Jul 2024
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability
Hyun Joon Park
Jin Sob Kim
Wooseok Shin
Sung Won Han
DiffM
33
2
0
27 Jun 2024
Scalable Artificial Intelligence for Science: Perspectives, Methods and Exemplars
Wesley Brewer
Aditya Kashi
Sajal Dash
A. Tsaris
Junqi Yin
Mallikarjun Shankar
Feiyi Wang
40
0
0
24 Jun 2024
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
Chao Lou
Zixia Jia
Zilong Zheng
Kewei Tu
ODL
35
18
0
24 Jun 2024
Vision Mamba-based autonomous crack segmentation on concrete, asphalt, and masonry surfaces
Zhaohui Chen
Elyas Asadi Shamsabadi
Sheng Jiang
Luming Shen
Daniel Dias-da-Costa
Mamba
39
3
0
24 Jun 2024
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
Tianyu Fu
Haofeng Huang
Xuefei Ning
Genghan Zhang
Boju Chen
...
Shiyao Li
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MQ
46
17
0
21 Jun 2024
CherryRec: Enhancing News Recommendation Quality via LLM-driven Framework
Shaohuang Wang
Lun Wang
Yunhan Bu
Tianwei Huang
35
2
0
18 Jun 2024
Generalisation to unseen topologies: Towards control of biological neural network activity
Laurens Engwegen
Daan Brinks
Wendelin Bohmer
MedIm
AI4CE
34
0
0
17 Jun 2024
Separations in the Representational Capabilities of Transformers and Recurrent Architectures
S. Bhattamishra
Michael Hahn
Phil Blunsom
Varun Kanade
GNN
41
9
0
13 Jun 2024
Cognitively Inspired Energy-Based World Models
Alexi Gladstone
Ganesh Nanduru
Md. Mofijul Islam
Aman Chadha
Jundong Li
Tariq Iqbal
36
0
0
13 Jun 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
74
56
0
11 Jun 2024
What Can We Learn from State Space Models for Machine Learning on Graphs?
Yinan Huang
Siqi Miao
Pan Li
44
7
0
09 Jun 2024
Previous
1
2
3
4
5
Next