Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.16880
Cited By
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
18 February 2024
Peng Xu
Wenqi Shao
Yonghong Tian
Shitao Tang
Kai-Chuang Zhang
Peng Gao
Fengwei An
Yu Qiao
Ping Luo
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation"
19 / 19 papers shown
Title
AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design
Yanbiao Liang
Huihong Shi
Haikuo Shao
Zhongfeng Wang
33
0
0
07 Apr 2025
Model Hemorrhage and the Robustness Limits of Large Language Models
Ziyang Ma
Zehan Li
Lefei Zhang
Gui-Song Xia
Bo Du
Liangpei Zhang
Dacheng Tao
59
0
0
31 Mar 2025
STADE: Standard Deviation as a Pruning Metric
Diego Coello de Portugal Mecke
Haya Alyoussef
Ilia Koloiarov
Maximilian Stubbemann
Lars Schmidt-Thieme
34
0
0
28 Mar 2025
Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs
Chang Gao
Kang Zhao
Jianfei Chen
Liping Jing
47
0
0
24 Mar 2025
Safe Vision-Language Models via Unsafe Weights Manipulation
Moreno DÍncà
E. Peruzzo
Xingqian Xu
Humphrey Shi
N. Sebe
Massimiliano Mancini
MU
60
0
0
14 Mar 2025
Dynamic Low-Rank Sparse Adaptation for Large Language Models
Weizhong Huang
Yuxin Zhang
Xiawu Zheng
Yong-Jin Liu
Jing Lin
Yiwu Yao
Rongrong Ji
97
1
0
21 Feb 2025
Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models
J. P. Muñoz
Jinjie Yuan
Nilesh Jain
Mamba
72
1
0
28 Jan 2025
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
Elia Cunegatti
Leonardo Lucio Custode
Giovanni Iacca
52
0
0
11 Nov 2024
Beware of Calibration Data for Pruning Large Language Models
Yixin Ji
Yang Xiang
Juntao Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
42
2
0
23 Oct 2024
CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information
Yuxin Wang
Minghua Ma
Zekun Wang
Jingchang Chen
Huiming Fan
Liping Shan
Qing Yang
Dongliang Xu
Ming Liu
Bing Qin
38
3
0
20 Sep 2024
Application Specific Compression of Deep Learning Models
Rohit Raj Rai
Angana Borah
Amit Awekar
29
0
0
09 Sep 2024
LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models
Yupeng Su
Ziyi Guan
Xiaoqun Liu
Tianlai Jin
Dongkuan Wu
G. Chesi
Ngai Wong
Hao Yu
45
1
0
20 Aug 2024
Inference Optimization of Foundation Models on AI Accelerators
Youngsuk Park
Kailash Budhathoki
Liangfu Chen
Jonas M. Kübler
Jiaji Huang
Matthäus Kleindessner
Jun Huan
V. Cevher
Yida Wang
George Karypis
45
3
0
12 Jul 2024
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Yonghong Tian
Wenqi Shao
Peng Xu
Jiahao Wang
Peng Gao
Kaipeng Zhang
Ping Luo
MQ
50
26
0
10 Jul 2024
BlockPruner: Fine-grained Pruning for Large Language Models
Longguang Zhong
Fanqi Wan
Ruijun Chen
Xiaojun Quan
Liangzhi Li
33
7
0
15 Jun 2024
Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models
Peijie Dong
Lujun Li
Zhenheng Tang
Xiang Liu
Xinglin Pan
Qiang-qiang Wang
Xiaowen Chu
62
23
0
05 Jun 2024
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu-Xiang Wang
46
83
0
22 Apr 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
41
48
0
15 Feb 2024
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
253
1,073
0
05 Oct 2022
1