ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.00112
  4. Cited By
Language model compression with weighted low-rank factorization

Language model compression with weighted low-rank factorization

30 June 2022
Yen-Chang Hsu
Ting Hua
Sung-En Chang
Qiang Lou
Yilin Shen
Hongxia Jin
ArXiv (abs)PDFHTML

Papers citing "Language model compression with weighted low-rank factorization"

50 / 78 papers shown
Title
FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE
FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE
Khiem Le
Tuan V. Tran
Ting Hua
Nitesh Chawla
MoE
12
0
0
19 Jun 2025
TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices
TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices
Mingxue Xu
Y. Xu
Danilo Mandic
41
0
0
16 Jun 2025
Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence
Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence
Yibo Yang
Sihao Liu
Chuan Rao
Bang An
Tiancheng Shen
Philip Torr
Ming-Hsuan Yang
Bernard Ghanem
37
0
0
16 Jun 2025
Olica: Efficient Structured Pruning of Large Language Models without Retraining
Jiujun He
Huazhen Lin
33
0
0
10 Jun 2025
ProcrustesGPT: Compressing LLMs with Structured Matrices and Orthogonal Transformations
ProcrustesGPT: Compressing LLMs with Structured Matrices and Orthogonal Transformations
Ekaterina Grishina
Mikhail Gorbunov
Maxim Rakhuba
63
0
0
03 Jun 2025
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
Xianglong Yan
Zhiteng Li
Tianao Zhang
Linghe Kong
Yulun Zhang
Xiaokang Yang
82
0
0
30 May 2025
ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning
ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning
Zhendong Mi
Zhenglun Kong
Geng Yuan
Shaoyi Huang
56
0
0
28 May 2025
SlimLLM: Accurate Structured Pruning for Large Language Models
SlimLLM: Accurate Structured Pruning for Large Language Models
Jialong Guo
Xinghao Chen
Yehui Tang
Yunhe Wang
34
0
0
28 May 2025
Efficient Large Language Model Inference with Neural Block Linearization
Efficient Large Language Model Inference with Neural Block Linearization
Mete Erdogan
F. Tonin
Volkan Cevher
83
0
0
27 May 2025
TuneComp: Joint Fine-tuning and Compression for Large Foundation Models
TuneComp: Joint Fine-tuning and Compression for Large Foundation Models
Xiangyu Chen
Jing Liu
Ye Wang
Matthew Brand
Wang
T. Koike-Akino
77
0
0
27 May 2025
ResSVD: Residual Compensated SVD for Large Language Model Compression
ResSVD: Residual Compensated SVD for Large Language Model Compression
Haolei Bai
Siyong Jian
Tuo Liang
Yu Yin
Huan Wang
55
0
0
26 May 2025
Generalized Fisher-Weighted SVD: Scalable Kronecker-Factored Fisher Approximation for Compressing Large Language Models
Generalized Fisher-Weighted SVD: Scalable Kronecker-Factored Fisher Approximation for Compressing Large Language Models
Viktoriia Chekalina
Daniil Moskovskiy
Daria Cherniuk
Maxim Kurkin
Andrey Kuznetsov
Evgeny Frolov
220
0
0
23 May 2025
Zero-Trust Mobility-Aware Authentication Framework for Secure Vehicular Fog Computing Networks
Zero-Trust Mobility-Aware Authentication Framework for Secure Vehicular Fog Computing Networks
Taimoor Ahmad
44
0
0
21 May 2025
A3 : an Analytical Low-Rank Approximation Framework for Attention
A3 : an Analytical Low-Rank Approximation Framework for Attention
Jeffrey T. H. Wong
Cheng Zhang
Xinye Cao
Pedro Gimenes
George A. Constantinides
Wayne Luk
Yiren Zhao
OffRLMQ
136
1
0
19 May 2025
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition
Zhiyuan Chen
Keyi Li
Yifan Jia
Le Ye
Yufei Ma
DiffM
86
0
0
09 May 2025
LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities
LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities
Kalyan Nakka
Jimmy Dani
Ausmit Mondal
Nitesh Saxena
AAML
75
0
0
08 May 2025
Towards Understanding and Improving Refusal in Compressed Models via Mechanistic Interpretability
Towards Understanding and Improving Refusal in Compressed Models via Mechanistic Interpretability
Vishnu Kabir Chhabra
Mohammad Mahdi Khalili
AI4CE
89
0
0
05 Apr 2025
An Efficient Training Algorithm for Models with Block-wise Sparsity
An Efficient Training Algorithm for Models with Block-wise Sparsity
Ding Zhu
Zhiqun Zuo
Mohammad Mahdi Khalili
57
0
0
27 Mar 2025
Large Language Model Compression via the Nested Activation-Aware Decomposition
Large Language Model Compression via the Nested Activation-Aware Decomposition
Jun Lu
Tianyi Xu
Bill Ding
David Li
Yu Kang
84
1
0
21 Mar 2025
SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression
SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression
Xin Wang
Samiul Alam
Zhongwei Wan
Jikang Cheng
Hao Fei
MQ
115
4
0
16 Mar 2025
CASP: Compression of Large Multimodal Models Based on Attention Sparsity
Mohsen Gholami
Mohammad Akbari
Kevin Cannons
Yong Zhang
113
0
0
07 Mar 2025
Wanda++: Pruning Large Language Models via Regional Gradients
Wanda++: Pruning Large Language Models via Regional Gradients
Yifan Yang
Kai Zhen
Bhavana Ganesh
Aram Galstyan
Goeric Huybrechts
...
S. Bodapati
Nathan Susanj
Zheng Zhang
Jack FitzGerald
Abhishek Kumar
231
3
0
06 Mar 2025
Optimizing Singular Spectrum for Large Language Model Compression
Dengjie Li
Tiancheng Shen
Yao Zhou
Baisong Yang
Zhongying Liu
Masheng Yang
Guohao Li
Yibo Yang
Yujie Zhong
Ming-Hsuan Yang
88
1
0
24 Feb 2025
When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time?
When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time?
Chenyang Li
Yingyu Liang
Zhenmei Shi
Zhao Song
76
5
0
24 Feb 2025
Choose Your Model Size: Any Compression by a Single Gradient Descent
Choose Your Model Size: Any Compression by a Single Gradient Descent
Martin Genzel
Patrick Putzky
Pengfei Zhao
Siyang Song
Mattes Mollenhauer
Robert Seidel
Stefan Dietzel
Thomas Wollmann
110
0
0
03 Feb 2025
You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning
You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning
Ayan Sengupta
Siddhant Chaudhary
Tanmoy Chakraborty
127
4
0
25 Jan 2025
Krony-PT: GPT2 compressed with Kronecker Products
Krony-PT: GPT2 compressed with Kronecker Products
M. Ayoub Ben Ayad
Jelena Mitrović
Michael Granitzer
107
0
0
16 Dec 2024
RWKV-Lite: Deeply Compressed RWKV for Resource-Constrained Devices
RWKV-Lite: Deeply Compressed RWKV for Resource-Constrained Devices
Wonkyo Choe
Yangfeng Ji
F. Lin
153
1
0
14 Dec 2024
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models
Fan Wang
Juyong Jiang
Chansung Park
Sunghun Kim
Jing Tang
205
2
0
08 Dec 2024
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models
  using Soft-Thresholding Mechanism
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models using Soft-Thresholding Mechanism
Priyansh Bhatnagar
Linfeng Wen
Mingu Kang
51
0
0
15 Nov 2024
MoE-I$^2$: Compressing Mixture of Experts Models through Inter-Expert
  Pruning and Intra-Expert Low-Rank Decomposition
MoE-I2^22: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition
Cheng Yang
Yang Sui
Jinqi Xiao
Lingyi Huang
Yu Gong
Yuanlin Duan
Wenqi Jia
Miao Yin
Yu Cheng
Bo Yuan
MoE
153
7
0
01 Nov 2024
BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments
BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments
Xinghao Wang
Pengyu Wang
Bo Wang
Dong Zhang
Yunhua Zhou
Xipeng Qiu
MQ
89
1
0
31 Oct 2024
MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers
MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers
Zebin Yang
Renze Chen
Taiqiang Wu
Ngai Wong
Yun Liang
Runsheng Wang
R. Huang
Meng Li
MQ
90
1
0
23 Oct 2024
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model
  Compression
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
Jingcun Wang
Yu-Guang Chen
Ing-Chao Lin
Bing Li
Grace Li Zhang
90
4
0
02 Oct 2024
MoDeGPT: Modular Decomposition for Large Language Model Compression
MoDeGPT: Modular Decomposition for Large Language Model Compression
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
171
13
0
19 Aug 2024
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large
  Language Models
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
Zhongyu Zhao
Menghang Dong
Rongyu Zhang
Wenzhao Zheng
Yunpeng Zhang
Huanrui Yang
Dalong Du
Kurt Keutzer
Shanghang Zhang
116
0
0
15 Aug 2024
Computer Vision Model Compression Techniques for Embedded Systems: A
  Survey
Computer Vision Model Compression Techniques for Embedded Systems: A Survey
Alexandre Lopes
Fernando Pereira dos Santos
D. Oliveira
Mauricio Schiezaro
Hélio Pedrini
91
11
0
15 Aug 2024
SeLoRA: Self-Expanding Low-Rank Adaptation of Latent Diffusion Model for
  Medical Image Synthesis
SeLoRA: Self-Expanding Low-Rank Adaptation of Latent Diffusion Model for Medical Image Synthesis
Yuchen Mao
Hongwei Bran Li
Wei Pang
G. Papanastasiou
G. Yang
Chengjia Wang
MedIm
78
2
0
13 Aug 2024
Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
Utkarsh Saxena
Gobinda Saha
Sakshi Choudhary
Kaushik Roy
103
18
0
10 Aug 2024
Palu: Compressing KV-Cache with Low-Rank Projection
Palu: Compressing KV-Cache with Low-Rank Projection
Chi-Chih Chang
Wei-Cheng Lin
Chien-Yu Lin
Chong-Yan Chen
Yu-Fang Hu
Pei-Shuo Wang
N. Huang
Luis Ceze
Kai-Chiang Wu
110
2
0
30 Jul 2024
From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications
From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications
Ajay Jaiswal
Yifan Wang
Zhenyu Zhang
Shiwei Liu
Runjin Chen
Jiawei Zhao
A. Grama
Yuandong Tian
Zhangyang Wang
85
15
0
15 Jul 2024
PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with
  Knowledge Distillation
PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation
Injoon Hwang
Haewon Park
Youngwan Lee
Jooyoung Yang
SunJae Maeng
AI4CE
53
3
0
13 Jun 2024
Reweighted Solutions for Weighted Low Rank Approximation
Reweighted Solutions for Weighted Low Rank Approximation
David P. Woodruff
T. Yasuda
80
1
0
04 Jun 2024
Compressing Large Language Models using Low Rank and Low Precision
  Decomposition
Compressing Large Language Models using Low Rank and Low Precision Decomposition
R. Saha
Naomi Sagan
Varun Srivastava
Andrea J. Goldsmith
Mert Pilanci
MQ
70
22
0
29 May 2024
Basis Selection: Low-Rank Decomposition of Pretrained Large Language
  Models for Target Applications
Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications
Yang Li
Changsheng Zhao
Hyungtak Lee
Ernie Chang
Yangyang Shi
Vikas Chandra
66
0
0
24 May 2024
LoRAP: Transformer Sub-Layers Deserve Differentiated Structured
  Compression for Large Language Models
LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models
Guangyan Li
Yongqiang Tang
Wensheng Zhang
86
6
0
15 Apr 2024
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
Xin Wang
Yu Zheng
Zhongwei Wan
Mi Zhang
MQ
165
64
0
12 Mar 2024
Bias Mitigation in Fine-tuning Pre-trained Models for Enhanced Fairness
  and Efficiency
Bias Mitigation in Fine-tuning Pre-trained Models for Enhanced Fairness and Efficiency
Yixuan Zhang
Feng Zhou
55
3
0
01 Mar 2024
Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank
  Compression Strategy
Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy
Seyedarmin Azizi
M. Nazemi
Massoud Pedram
ViTMQ
91
2
0
08 Feb 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank
  Modifications
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei
Kaixuan Huang
Yangsibo Huang
Tinghao Xie
Xiangyu Qi
Mengzhou Xia
Prateek Mittal
Mengdi Wang
Peter Henderson
AAML
162
118
0
07 Feb 2024
12
Next