Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.00112
Cited By
Language model compression with weighted low-rank factorization
30 June 2022
Yen-Chang Hsu
Ting Hua
Sung-En Chang
Qiang Lou
Yilin Shen
Hongxia Jin
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Language model compression with weighted low-rank factorization"
50 / 78 papers shown
Title
FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE
Khiem Le
Tuan V. Tran
Ting Hua
Nitesh Chawla
MoE
12
0
0
19 Jun 2025
TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices
Mingxue Xu
Y. Xu
Danilo Mandic
41
0
0
16 Jun 2025
Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence
Yibo Yang
Sihao Liu
Chuan Rao
Bang An
Tiancheng Shen
Philip Torr
Ming-Hsuan Yang
Bernard Ghanem
37
0
0
16 Jun 2025
Olica: Efficient Structured Pruning of Large Language Models without Retraining
Jiujun He
Huazhen Lin
33
0
0
10 Jun 2025
ProcrustesGPT: Compressing LLMs with Structured Matrices and Orthogonal Transformations
Ekaterina Grishina
Mikhail Gorbunov
Maxim Rakhuba
63
0
0
03 Jun 2025
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
Xianglong Yan
Zhiteng Li
Tianao Zhang
Linghe Kong
Yulun Zhang
Xiaokang Yang
82
0
0
30 May 2025
ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning
Zhendong Mi
Zhenglun Kong
Geng Yuan
Shaoyi Huang
56
0
0
28 May 2025
SlimLLM: Accurate Structured Pruning for Large Language Models
Jialong Guo
Xinghao Chen
Yehui Tang
Yunhe Wang
34
0
0
28 May 2025
Efficient Large Language Model Inference with Neural Block Linearization
Mete Erdogan
F. Tonin
Volkan Cevher
83
0
0
27 May 2025
TuneComp: Joint Fine-tuning and Compression for Large Foundation Models
Xiangyu Chen
Jing Liu
Ye Wang
Matthew Brand
Wang
T. Koike-Akino
77
0
0
27 May 2025
ResSVD: Residual Compensated SVD for Large Language Model Compression
Haolei Bai
Siyong Jian
Tuo Liang
Yu Yin
Huan Wang
55
0
0
26 May 2025
Generalized Fisher-Weighted SVD: Scalable Kronecker-Factored Fisher Approximation for Compressing Large Language Models
Viktoriia Chekalina
Daniil Moskovskiy
Daria Cherniuk
Maxim Kurkin
Andrey Kuznetsov
Evgeny Frolov
220
0
0
23 May 2025
Zero-Trust Mobility-Aware Authentication Framework for Secure Vehicular Fog Computing Networks
Taimoor Ahmad
44
0
0
21 May 2025
A3 : an Analytical Low-Rank Approximation Framework for Attention
Jeffrey T. H. Wong
Cheng Zhang
Xinye Cao
Pedro Gimenes
George A. Constantinides
Wayne Luk
Yiren Zhao
OffRL
MQ
136
1
0
19 May 2025
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition
Zhiyuan Chen
Keyi Li
Yifan Jia
Le Ye
Yufei Ma
DiffM
86
0
0
09 May 2025
LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities
Kalyan Nakka
Jimmy Dani
Ausmit Mondal
Nitesh Saxena
AAML
75
0
0
08 May 2025
Towards Understanding and Improving Refusal in Compressed Models via Mechanistic Interpretability
Vishnu Kabir Chhabra
Mohammad Mahdi Khalili
AI4CE
89
0
0
05 Apr 2025
An Efficient Training Algorithm for Models with Block-wise Sparsity
Ding Zhu
Zhiqun Zuo
Mohammad Mahdi Khalili
57
0
0
27 Mar 2025
Large Language Model Compression via the Nested Activation-Aware Decomposition
Jun Lu
Tianyi Xu
Bill Ding
David Li
Yu Kang
84
1
0
21 Mar 2025
SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression
Xin Wang
Samiul Alam
Zhongwei Wan
Jikang Cheng
Hao Fei
MQ
115
4
0
16 Mar 2025
CASP: Compression of Large Multimodal Models Based on Attention Sparsity
Mohsen Gholami
Mohammad Akbari
Kevin Cannons
Yong Zhang
113
0
0
07 Mar 2025
Wanda++: Pruning Large Language Models via Regional Gradients
Yifan Yang
Kai Zhen
Bhavana Ganesh
Aram Galstyan
Goeric Huybrechts
...
S. Bodapati
Nathan Susanj
Zheng Zhang
Jack FitzGerald
Abhishek Kumar
231
3
0
06 Mar 2025
Optimizing Singular Spectrum for Large Language Model Compression
Dengjie Li
Tiancheng Shen
Yao Zhou
Baisong Yang
Zhongying Liu
Masheng Yang
Guohao Li
Yibo Yang
Yujie Zhong
Ming-Hsuan Yang
88
1
0
24 Feb 2025
When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time?
Chenyang Li
Yingyu Liang
Zhenmei Shi
Zhao Song
76
5
0
24 Feb 2025
Choose Your Model Size: Any Compression by a Single Gradient Descent
Martin Genzel
Patrick Putzky
Pengfei Zhao
Siyang Song
Mattes Mollenhauer
Robert Seidel
Stefan Dietzel
Thomas Wollmann
110
0
0
03 Feb 2025
You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning
Ayan Sengupta
Siddhant Chaudhary
Tanmoy Chakraborty
127
4
0
25 Jan 2025
Krony-PT: GPT2 compressed with Kronecker Products
M. Ayoub Ben Ayad
Jelena Mitrović
Michael Granitzer
107
0
0
16 Dec 2024
RWKV-Lite: Deeply Compressed RWKV for Resource-Constrained Devices
Wonkyo Choe
Yangfeng Ji
F. Lin
153
1
0
14 Dec 2024
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models
Fan Wang
Juyong Jiang
Chansung Park
Sunghun Kim
Jing Tang
205
2
0
08 Dec 2024
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models using Soft-Thresholding Mechanism
Priyansh Bhatnagar
Linfeng Wen
Mingu Kang
51
0
0
15 Nov 2024
MoE-I
2
^2
2
: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition
Cheng Yang
Yang Sui
Jinqi Xiao
Lingyi Huang
Yu Gong
Yuanlin Duan
Wenqi Jia
Miao Yin
Yu Cheng
Bo Yuan
MoE
153
7
0
01 Nov 2024
BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments
Xinghao Wang
Pengyu Wang
Bo Wang
Dong Zhang
Yunhua Zhou
Xipeng Qiu
MQ
89
1
0
31 Oct 2024
MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers
Zebin Yang
Renze Chen
Taiqiang Wu
Ngai Wong
Yun Liang
Runsheng Wang
R. Huang
Meng Li
MQ
90
1
0
23 Oct 2024
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
Jingcun Wang
Yu-Guang Chen
Ing-Chao Lin
Bing Li
Grace Li Zhang
90
4
0
02 Oct 2024
MoDeGPT: Modular Decomposition for Large Language Model Compression
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
171
13
0
19 Aug 2024
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
Zhongyu Zhao
Menghang Dong
Rongyu Zhang
Wenzhao Zheng
Yunpeng Zhang
Huanrui Yang
Dalong Du
Kurt Keutzer
Shanghang Zhang
116
0
0
15 Aug 2024
Computer Vision Model Compression Techniques for Embedded Systems: A Survey
Alexandre Lopes
Fernando Pereira dos Santos
D. Oliveira
Mauricio Schiezaro
Hélio Pedrini
91
11
0
15 Aug 2024
SeLoRA: Self-Expanding Low-Rank Adaptation of Latent Diffusion Model for Medical Image Synthesis
Yuchen Mao
Hongwei Bran Li
Wei Pang
G. Papanastasiou
G. Yang
Chengjia Wang
MedIm
78
2
0
13 Aug 2024
Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
Utkarsh Saxena
Gobinda Saha
Sakshi Choudhary
Kaushik Roy
103
18
0
10 Aug 2024
Palu: Compressing KV-Cache with Low-Rank Projection
Chi-Chih Chang
Wei-Cheng Lin
Chien-Yu Lin
Chong-Yan Chen
Yu-Fang Hu
Pei-Shuo Wang
N. Huang
Luis Ceze
Kai-Chiang Wu
110
2
0
30 Jul 2024
From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications
Ajay Jaiswal
Yifan Wang
Zhenyu Zhang
Shiwei Liu
Runjin Chen
Jiawei Zhao
A. Grama
Yuandong Tian
Zhangyang Wang
85
15
0
15 Jul 2024
PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation
Injoon Hwang
Haewon Park
Youngwan Lee
Jooyoung Yang
SunJae Maeng
AI4CE
53
3
0
13 Jun 2024
Reweighted Solutions for Weighted Low Rank Approximation
David P. Woodruff
T. Yasuda
80
1
0
04 Jun 2024
Compressing Large Language Models using Low Rank and Low Precision Decomposition
R. Saha
Naomi Sagan
Varun Srivastava
Andrea J. Goldsmith
Mert Pilanci
MQ
70
22
0
29 May 2024
Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications
Yang Li
Changsheng Zhao
Hyungtak Lee
Ernie Chang
Yangyang Shi
Vikas Chandra
66
0
0
24 May 2024
LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models
Guangyan Li
Yongqiang Tang
Wensheng Zhang
86
6
0
15 Apr 2024
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
Xin Wang
Yu Zheng
Zhongwei Wan
Mi Zhang
MQ
165
64
0
12 Mar 2024
Bias Mitigation in Fine-tuning Pre-trained Models for Enhanced Fairness and Efficiency
Yixuan Zhang
Feng Zhou
55
3
0
01 Mar 2024
Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy
Seyedarmin Azizi
M. Nazemi
Massoud Pedram
ViT
MQ
91
2
0
08 Feb 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei
Kaixuan Huang
Yangsibo Huang
Tinghao Xie
Xiangyu Qi
Mengzhou Xia
Prateek Mittal
Mengdi Wang
Peter Henderson
AAML
162
118
0
07 Feb 2024
1
2
Next