Title
Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond Costin-Andrei Oncescu Sanket Purandare Stratos Idreos Sham Kakade VLM AI4TS 3DV 26 0 0 16 Oct 2024
FutureFill: Fast Generation from Convolutional Sequence Models Naman Agarwal Xinyi Chen Evan Dogariu Vlad Feinberg Daniel Suo Peter L. Bartlett Elad Hazan AI4TS MQ 43 2 0 02 Oct 2024
Reparameterized Multi-Resolution Convolutions for Long Sequence Modelling Harry Jake Cunningham Giorgio Giannone Mingtian Zhang M. Deisenroth 30 0 0 18 Aug 2024
LoCoCo: Dropping In Convolutions for Long Context Compression Ruisi Cai Yuandong Tian Zhangyang Wang Beidi Chen 46 9 0 08 Jun 2024
State-Free Inference of State-Space Models: The Transfer Function Approach Rom N. Parnichkun Stefano Massaroli Alessandro Moro Jimmy T.H. Smith Ramin Hasani ... Hajime Asama Stefano Ermon Taiji Suzuki Atsushi Yamashita Michael Poli 44 5 0 10 May 2024
State Space Model for New-Generation Network Alternative to Transformers: A Survey Tianlin Li Shiao Wang Yuhe Ding Yuehang Li Wentao Wu ... Bowei Jiang Chenglong Li Yaowei Wang Yonghong Tian Jin Tang Mamba 33 49 0 15 Apr 2024
Mechanistic Design and Scaling of Hybrid Architectures Michael Poli Armin W. Thomas Eric N. D. Nguyen Pragaash Ponnusamy Bjorn Deiseroth ... Brian Hie Stefano Ermon Christopher Ré Ce Zhang Stefano Massaroli MoE 57 21 0 26 Mar 2024
Towards a theory of model distillation Enric Boix-Adserà FedML VLM 44 6 0 14 Mar 2024
Model Compression Method for S4 with Diagonal State Space Layers using Balanced Truncation Haruka Ezoe Kazuhiro Sato 30 0 0 25 Feb 2024
On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era Matteo Tiezzi Michele Casoni Alessandro Betti Tommaso Guidi Marco Gori S. Melacci 24 9 0 12 Feb 2024
Scavenging Hyena: Distilling Transformers into Long Convolution Models Tokiniaina Raharison Ralambomihanta Shahrad Mohammadzadeh Mohammad Sami Nur Islam Wassim Jabbour Laurence Liang 21 3 0 31 Jan 2024
Gated Linear Attention Transformers with Hardware-Efficient Training Aaron Courville Bailin Wang Songlin Yang Yikang Shen Yoon Kim 48 142 0 11 Dec 2023
Resurrecting Recurrent Neural Networks for Long Sequences Antonio Orvieto Samuel L. Smith Albert Gu Anushan Fernando Çağlar Gülçehre Razvan Pascanu Soham De 88 268 0 11 Mar 2023
FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes David W. Romero Robert-Jan Bruintjes Jakub M. Tomczak Erik J. Bekkers Mark Hoogendoorn Jan van Gemert 80 82 0 15 Oct 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 279 1,996 0 31 Dec 2020