Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.00595
Cited By
Monarch: Expressive Structured Matrices for Efficient and Accurate Training
1 April 2022
Tri Dao
Beidi Chen
N. Sohoni
Arjun D Desai
Michael Poli
Jessica Grogan
Alexander Liu
Aniruddh Rao
Atri Rudra
Christopher Ré
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Monarch: Expressive Structured Matrices for Efficient and Accurate Training"
50 / 66 papers shown
Title
Block Circulant Adapter for Large Language Models
Xinyu Ding
Meiqi Wang
Siyu Liao
Zhongfeng Wang
40
0
0
01 May 2025
MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning
Xu Han
Yuan Tang
Jinfeng Xu
Xianzhi Li
56
0
0
24 Mar 2025
Brain-inspired sparse training enables Transformers and LLMs to perform as fully connected
Yingtao Zhang
Jialin Zhao
Wenjing Wu
Ziheng Liao
Umberto Michieli
C. Cannistraci
58
0
0
31 Jan 2025
BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference
Changwoo Lee
Soo Min Kwon
Qing Qu
Hun-Seok Kim
34
0
0
28 Oct 2024
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
41
4
0
24 Oct 2024
Geometry is All You Need: A Unified Taxonomy of Matrix and Tensor Factorization for Compression of Generative Language Models
Mingxue Xu
Sadia Sharmin
Danilo Mandic
40
2
0
03 Oct 2024
Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices
Andres Potapczynski
Shikai Qiu
Marc Finzi
Christopher Ferri
Zixi Chen
Micah Goldblum
Bayan Bruss
Christopher De Sa
Andrew Gordon Wilson
45
1
0
03 Oct 2024
Efficient Source-Free Time-Series Adaptation via Parameter Subspace Disentanglement
Gaurav Patel
Christopher Sandino
Behrooz Mahasseni
Ellen L. Zippi
Erdrin Azemi
Ali Moin
Juri Minxha
TTA
AI4TS
55
3
0
03 Oct 2024
Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization
Vladimír Boža
Vladimír Macko
40
1
0
27 Sep 2024
Symmetry-Based Structured Matrices for Efficient Approximately Equivariant Networks
Ashwin Samudre
Mircea Petrache
Brian D. Nord
Shubhendu Trivedi
55
2
0
18 Sep 2024
MoRe Fine-Tuning with 10x Fewer Parameters
Wenxuan Tan
Nicholas Roberts
Tzu-Heng Huang
Jitian Zhao
John Cooper
Samuel Guo
Chengyu Duan
Frederic Sala
34
0
0
30 Aug 2024
Mixed Sparsity Training: Achieving 4
×
\times
×
FLOP Reduction for Transformer Pretraining
Pihe Hu
Shaolong Li
Longbo Huang
33
0
0
21 Aug 2024
Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
Sanae Lotfi
Yilun Kuang
Brandon Amos
Micah Goldblum
Marc Finzi
Andrew Gordon Wilson
31
9
0
25 Jul 2024
Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers
Sukjun Hwang
Aakash Lahoti
Tri Dao
Albert Gu
Mamba
62
12
0
13 Jul 2024
Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers
Xiuying Wei
Skander Moalla
Razvan Pascanu
Çağlar Gülçehre
34
0
0
24 Jun 2024
An Empirical Investigation of Matrix Factorization Methods for Pre-trained Transformers
Ashim Gupta
Sina Mahdipour Saravani
P. Sadayappan
Vivek Srikumar
35
2
0
17 Jun 2024
Group and Shuffle: Efficient Structured Orthogonal Parametrization
Mikhail Gorbunov
Nikolay Yudin
Vera Soboleva
Aibek Alanov
Alexey Naumov
Maxim Rakhuba
50
1
0
14 Jun 2024
Compute Better Spent: Replacing Dense Layers with Structured Matrices
Shikai Qiu
Andres Potapczynski
Marc Finzi
Micah Goldblum
Andrew Gordon Wilson
40
12
0
10 Jun 2024
Language Model Cascades: Token-level uncertainty and beyond
Neha Gupta
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
A. Menon
Sanjiv Kumar
UQLM
55
42
0
15 Apr 2024
Adaptive Patching for High-resolution Image Segmentation with Transformers
Enzhi Zhang
Isaac Lyngaas
Peng Chen
Xiao Wang
Jun Igarashi
Yuankai Huo
Mohamed Wahib
M. Munetomo
MedIm
32
2
0
15 Apr 2024
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Ali Behrouz
Michele Santacatterina
Ramin Zabih
55
31
0
29 Mar 2024
Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model
Haoyun Xu
Runzhe Zhan
Derek F. Wong
Lidia S. Chao
31
3
0
18 Mar 2024
Spiking Wavelet Transformer
Yuetong Fang
Ziqing Wang
Lingfeng Zhang
Jiahang Cao
Honglei Chen
Renjing Xu
69
5
0
17 Mar 2024
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
Vithursan Thangarasa
Mahmoud Salem
Shreyas Saxena
Kevin Leong
Joel Hestness
Sean Lie
MedIm
40
1
0
01 Mar 2024
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Mahdi Karami
Ali Ghodsi
VLM
52
6
0
28 Feb 2024
On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference
Siyu Ren
Kenny Q. Zhu
28
27
0
09 Feb 2024
Gated Linear Attention Transformers with Hardware-Efficient Training
Aaron Courville
Bailin Wang
Songlin Yang
Yikang Shen
Yoon Kim
48
145
0
11 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
31
22
0
01 Dec 2023
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
49
0
0
01 Dec 2023
Dimension Mixer: A Generalized Method for Structured Sparsity in Deep Neural Networks
Suman Sapkota
Binod Bhattarai
37
0
0
30 Nov 2023
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Weiyang Liu
Zeju Qiu
Yao Feng
Yuliang Xiu
Yuxuan Xue
...
Songyou Peng
Yandong Wen
Michael J. Black
Adrian Weller
Bernhard Schölkopf
50
58
0
10 Nov 2023
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Daniel Y. Fu
Hermann Kumbong
Eric N. D. Nguyen
Christopher Ré
VLM
46
29
0
10 Nov 2023
Factor Fitting, Rank Allocation, and Partitioning in Multilevel Low Rank Matrices
Tetiana Parshakova
Trevor Hastie
Eric Darve
Stephen P. Boyd
21
1
0
30 Oct 2023
Differentiable Learning of Generalized Structured Matrices for Efficient Deep Neural Networks
Changwoo Lee
Hun-Seok Kim
38
3
0
29 Oct 2023
Multi-Grid Tensorized Fourier Neural Operator for High-Resolution PDEs
Jean Kossaifi
Nikola B. Kovachki
Kamyar Azizzadenesheli
Anima Anandkumar
AI4CE
42
33
0
29 Sep 2023
InRank: Incremental Low-Rank Learning
Jiawei Zhao
Yifei Zhang
Beidi Chen
F. Schafer
Anima Anandkumar
33
7
0
20 Jun 2023
Does a sparse ReLU network training problem always admit an optimum?
Quoc-Tung Le
E. Riccietti
Rémi Gribonval
19
2
0
05 Jun 2023
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model
Zirui Liu
Guanchu Wang
Shaochen Zhong
Zhaozhuo Xu
Daochen Zha
...
Zhimeng Jiang
Kaixiong Zhou
V. Chaudhary
Shuai Xu
Xia Hu
52
12
0
24 May 2023
Cuttlefish: Low-Rank Model Training without All the Tuning
Hongyi Wang
Saurabh Agarwal
Pongsakorn U-chupala
Yoshiki Tanaka
Eric P. Xing
Dimitris Papailiopoulos
OffRL
63
22
0
04 May 2023
Sparsity in neural networks can improve their privacy
Antoine Gonon
Léon Zheng
Clément Lalanne
Quoc-Tung Le
Guillaume Lauga
Can Pouliquen
45
2
0
20 Apr 2023
STen: Productive and Efficient Sparsity in PyTorch
Andrei Ivanov
Nikoli Dryden
Tal Ben-Nun
Saleh Ashkboos
Torsten Hoefler
39
4
0
15 Apr 2023
Can sparsity improve the privacy of neural networks?
Antoine Gonon
Léon Zheng
Clément Lalanne
Quoc-Tung Le
Guillaume Lauga
Can Pouliquen
21
0
0
11 Apr 2023
Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
Cong Wei
Brendan Duke
R. Jiang
P. Aarabi
Graham W. Taylor
Florian Shkurti
ViT
48
15
0
24 Mar 2023
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
Vithursan Thangarasa
Shreyas Saxena
Abhay Gupta
Sean Lie
41
3
0
21 Mar 2023
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
Vithursan Thangarasa
Abhay Gupta
William Marshall
Tianda Li
Kevin Leong
D. DeCoste
Sean Lie
Shreyas Saxena
MoE
AI4CE
29
18
0
18 Mar 2023
Learning to Grow Pretrained Models for Efficient Transformer Training
Peihao Wang
Yikang Shen
Lucas Torroba Hennigen
P. Greengard
Leonid Karlinsky
Rogerio Feris
David D. Cox
Zhangyang Wang
Yoon Kim
47
53
0
02 Mar 2023
Hyena Hierarchy: Towards Larger Convolutional Language Models
Michael Poli
Stefano Massaroli
Eric Q. Nguyen
Daniel Y. Fu
Tri Dao
S. Baccus
Yoshua Bengio
Stefano Ermon
Christopher Ré
VLM
28
286
0
21 Feb 2023
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Daniel Y. Fu
Elliot L. Epstein
Eric N. D. Nguyen
A. Thomas
Michael Zhang
Tri Dao
Atri Rudra
Christopher Ré
25
52
0
13 Feb 2023
Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook for Sparse Neural Network Researchers
Shiwei Liu
Zhangyang Wang
37
30
0
06 Feb 2023
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
78
372
0
28 Dec 2022
1
2
Next