Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.01382
Cited By
Compressing LLMs: The Truth is Rarely Pure and Never Simple
2 October 2023
Ajay Jaiswal
Zhe Gan
Xianzhi Du
Bowen Zhang
Zhangyang Wang
Yinfei Yang
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Compressing LLMs: The Truth is Rarely Pure and Never Simple"
38 / 38 papers shown
Title
Semantic Retention and Extreme Compression in LLMs: Can We Have Both?
Stanislas Laborde
Martin Cousseau
Antoun Yaacoub
Lionel Prevost
MQ
23
0
0
12 May 2025
Stability in Single-Peaked Strategic Resource Selection Games
Henri Zeiler
32
3
0
09 May 2025
Radio: Rate-Distortion Optimization for Large Language Model Compression
Sean I. Young
MQ
23
0
0
05 May 2025
Efficient Shapley Value-based Non-Uniform Pruning of Large Language Models
Chuan Sun
Han Yu
Lizhen Cui
Xiaoxiao Li
96
0
0
03 May 2025
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency
E. J. Husom
Arda Goknil
Merve Astekin
Lwin Khin Shar
Andre Kåsen
S. Sen
Benedikt Andreas Mithassel
Ahmet Soylu
MQ
43
0
0
04 Apr 2025
Model Hemorrhage and the Robustness Limits of Large Language Models
Ziyang Ma
Zehan Li
L. Zhang
Gui-Song Xia
Bo Du
Liangpei Zhang
Dacheng Tao
59
0
0
31 Mar 2025
Q&C: When Quantization Meets Cache in Efficient Image Generation
Xin Ding
X. Li
Haotong Qin
Zhibo Chen
DiffM
MQ
75
0
0
04 Mar 2025
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Zhenheng Tang
Xiang Liu
Qian Wang
Peijie Dong
Bingsheng He
Xiaowen Chu
Bo Li
LRM
61
1
0
24 Feb 2025
The Impact of Inference Acceleration on Bias of LLMs
Elisabeth Kirsten
Ivan Habernal
Vedant Nanda
Muhammad Bilal Zafar
41
0
0
20 Feb 2025
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
179
0
0
08 Jan 2025
Reassessing Layer Pruning in LLMs: New Insights and Methods
Yao Lu
Hao Cheng
Yujie Fang
Zeyu Wang
Jiaheng Wei
Dongwei Xu
Qi Xuan
Xiaoniu Yang
Zhaowei Zhu
65
0
0
23 Nov 2024
Navigating Extremes: Dynamic Sparsity in Large Output Spaces
Nasib Ullah
Erik Schultheis
Mike Lasby
Yani Andrew Ioannou
Rohit Babbar
35
0
0
05 Nov 2024
Beware of Calibration Data for Pruning Large Language Models
Yixin Ji
Yang Xiang
Juntao Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
42
2
0
23 Oct 2024
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
Jinghan Jia
Jiancheng Liu
Yihua Zhang
Parikshit Ram
Nathalie Baracaldo
Sijia Liu
MU
35
2
0
23 Oct 2024
MatMamba: A Matryoshka State Space Model
Abhinav Shukla
Sai H. Vemprala
Aditya Kusupati
Ashish Kapoor
Mamba
28
0
0
09 Oct 2024
Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models
Bishwash Khanal
Jeffery M. Capone
28
1
0
17 Sep 2024
Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
Weiyu Huang
Yuezhou Hu
Guohao Jian
Jun Zhu
Jianfei Chen
35
5
0
30 Jul 2024
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients
Ajay Jaiswal
Lu Yin
Zhenyu (Allen) Zhang
Shiwei Liu
Jiawei Zhao
Yuandong Tian
Zhangyang Wang
38
14
0
15 Jul 2024
Accuracy is Not All You Need
Abhinav Dutta
Sanjeev Krishnan
Nipun Kwatra
Ramachandran Ramjee
46
3
0
12 Jul 2024
Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression
Zhichao Xu
Ashim Gupta
Tao Li
Oliver Bentham
Vivek Srikumar
52
8
0
06 Jul 2024
Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox
Yijun Liu
Yuan Meng
Fang Wu
Shenhao Peng
Hang Yao
Chaoyu Guan
Chen Tang
Xinzhu Ma
Zhi Wang
Wenwu Zhu
MQ
58
7
0
15 Jun 2024
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
Xudong Lu
Aojun Zhou
Yuhui Xu
Renrui Zhang
Peng Gao
Hongsheng Li
34
7
0
25 May 2024
Advances and Open Challenges in Federated Learning with Foundation Models
Chao Ren
Han Yu
Hongyi Peng
Xiaoli Tang
Anran Li
...
A. Tan
Bo Zhao
Xiaoxiao Li
Zengxiang Li
Qiang Yang
FedML
AIFin
AI4CE
78
7
0
23 Apr 2024
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu-Xiang Wang
46
83
0
22 Apr 2024
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Junyuan Hong
Jinhao Duan
Chenhui Zhang
Zhangheng Li
Chulin Xie
...
B. Kailkhura
Dan Hendrycks
Dawn Song
Zhangyang Wang
Bo-wen Li
36
24
0
18 Mar 2024
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
Zekun Qi
Runpei Dong
Shaochen Zhang
Haoran Geng
Chunrui Han
Zheng Ge
Li Yi
Kaisheng Ma
41
51
0
27 Feb 2024
HiRE: High Recall Approximate Top-
k
k
k
Estimation for Efficient LLM Inference
Yashas Samaga
Varun Yerram
Chong You
Srinadh Bhojanapalli
Sanjiv Kumar
Prateek Jain
Praneeth Netrapalli
56
4
0
14 Feb 2024
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
65
76
0
23 Dec 2023
PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs
Max Zimmer
Megi Andoni
Christoph Spiegel
Sebastian Pokutta
VLM
52
10
0
23 Dec 2023
Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization
G. Chrysostomou
Zhixue Zhao
Miles Williams
Nikolaos Aletras
HILM
34
10
0
15 Nov 2023
MatFormer: Nested Transformer for Elastic Inference
Devvrit
Sneha Kudugunta
Aditya Kusupati
Tim Dettmers
Kaifeng Chen
...
Yulia Tsvetkov
Hannaneh Hajishirzi
Sham Kakade
Ali Farhadi
Prateek Jain
39
22
0
11 Oct 2023
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Lu Yin
You Wu
Zhenyu (Allen) Zhang
Cheng-Yu Hsieh
Yaqing Wang
...
Mykola Pechenizkiy
Yi Liang
Michael Bendersky
Zhangyang Wang
Shiwei Liu
28
78
0
08 Oct 2023
The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning
Tian Jin
Nolan Clement
Xin Dong
Vaishnavh Nagarajan
Michael Carbin
Jonathan Ragan-Kelley
Gintare Karolina Dziugaite
LRM
51
5
0
07 Oct 2023
Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
Lu Yin
Ajay Jaiswal
Shiwei Liu
Souvik Kundu
Zhangyang Wang
27
7
0
29 Sep 2023
Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models
A. Jaiswal
Shiwei Liu
Tianlong Chen
Ying Ding
Zhangyang Wang
VLM
32
22
0
18 Jun 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
149
369
0
13 Mar 2023
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
105
341
0
05 Jan 2021
The Lottery Ticket Hypothesis for Pre-trained BERT Networks
Tianlong Chen
Jonathan Frankle
Shiyu Chang
Sijia Liu
Yang Zhang
Zhangyang Wang
Michael Carbin
156
345
0
23 Jul 2020
1