Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.18780
Cited By
Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions
28 October 2023
Stefano Massaroli
Michael Poli
Daniel Y. Fu
Hermann Kumbong
Rom N. Parnichkun
Aman Timalsina
David W. Romero
Quinn McIntyre
Beidi Chen
Atri Rudra
Ce Zhang
Christopher Ré
Stefano Ermon
Yoshua Bengio
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions"
15 / 15 papers shown
Title
Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond
Costin-Andrei Oncescu
Sanket Purandare
Stratos Idreos
Sham Kakade
VLM
AI4TS
3DV
26
0
0
16 Oct 2024
FutureFill: Fast Generation from Convolutional Sequence Models
Naman Agarwal
Xinyi Chen
Evan Dogariu
Vlad Feinberg
Daniel Suo
Peter L. Bartlett
Elad Hazan
AI4TS
MQ
43
2
0
02 Oct 2024
Reparameterized Multi-Resolution Convolutions for Long Sequence Modelling
Harry Jake Cunningham
Giorgio Giannone
Mingtian Zhang
M. Deisenroth
30
0
0
18 Aug 2024
LoCoCo: Dropping In Convolutions for Long Context Compression
Ruisi Cai
Yuandong Tian
Zhangyang Wang
Beidi Chen
46
9
0
08 Jun 2024
State-Free Inference of State-Space Models: The Transfer Function Approach
Rom N. Parnichkun
Stefano Massaroli
Alessandro Moro
Jimmy T.H. Smith
Ramin Hasani
...
Hajime Asama
Stefano Ermon
Taiji Suzuki
Atsushi Yamashita
Michael Poli
44
5
0
10 May 2024
State Space Model for New-Generation Network Alternative to Transformers: A Survey
Tianlin Li
Shiao Wang
Yuhe Ding
Yuehang Li
Wentao Wu
...
Bowei Jiang
Chenglong Li
Yaowei Wang
Yonghong Tian
Jin Tang
Mamba
33
49
0
15 Apr 2024
Mechanistic Design and Scaling of Hybrid Architectures
Michael Poli
Armin W. Thomas
Eric N. D. Nguyen
Pragaash Ponnusamy
Bjorn Deiseroth
...
Brian Hie
Stefano Ermon
Christopher Ré
Ce Zhang
Stefano Massaroli
MoE
57
21
0
26 Mar 2024
Towards a theory of model distillation
Enric Boix-Adserà
FedML
VLM
44
6
0
14 Mar 2024
Model Compression Method for S4 with Diagonal State Space Layers using Balanced Truncation
Haruka Ezoe
Kazuhiro Sato
30
0
0
25 Feb 2024
On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era
Matteo Tiezzi
Michele Casoni
Alessandro Betti
Tommaso Guidi
Marco Gori
S. Melacci
24
9
0
12 Feb 2024
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Tokiniaina Raharison Ralambomihanta
Shahrad Mohammadzadeh
Mohammad Sami Nur Islam
Wassim Jabbour
Laurence Liang
21
3
0
31 Jan 2024
Gated Linear Attention Transformers with Hardware-Efficient Training
Aaron Courville
Bailin Wang
Songlin Yang
Yikang Shen
Yoon Kim
48
142
0
11 Dec 2023
Resurrecting Recurrent Neural Networks for Long Sequences
Antonio Orvieto
Samuel L. Smith
Albert Gu
Anushan Fernando
Çağlar Gülçehre
Razvan Pascanu
Soham De
88
268
0
11 Mar 2023
FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes
David W. Romero
Robert-Jan Bruintjes
Jakub M. Tomczak
Erik J. Bekkers
Mark Hoogendoorn
Jan van Gemert
80
82
0
15 Oct 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
279
1,996
0
31 Dec 2020
1