Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.15347
Cited By
A Comprehensive Survey of Compression Algorithms for Language Models
27 January 2024
Seungcheol Park
Jaehyeon Choi
Sojin Lee
U. Kang
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Comprehensive Survey of Compression Algorithms for Language Models"
20 / 20 papers shown
Title
Lossless Compression for LLM Tensor Incremental Snapshots
Daniel Waddington
Cornel Constantinescu
9
0
0
14 May 2025
Zero-shot Quantization: A Comprehensive Survey
Minjun Kim
Jaehyeon Choi
Jongkeun Lee
Wonjin Cho
U. Kang
MQ
23
0
0
14 May 2025
The Impact of Inference Acceleration on Bias of LLMs
Elisabeth Kirsten
Ivan Habernal
Vedant Nanda
Muhammad Bilal Zafar
38
0
0
20 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
J. Zhao
Hao Wu
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
60
1
0
18 Feb 2025
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
62
16
0
06 Oct 2024
The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems
Linke Song
Zixuan Pang
Wenhao Wang
Zihao Wang
XiaoFeng Wang
Hongbo Chen
Wei Song
Yier Jin
Dan Meng
Rui Hou
56
7
0
30 Sep 2024
Compact Language Models via Pruning and Knowledge Distillation
Saurav Muralidharan
Sharath Turuvekere Sreenivas
Raviraj Joshi
Marcin Chochowski
M. Patwary
M. Shoeybi
Bryan Catanzaro
Jan Kautz
Pavlo Molchanov
SyDa
MQ
39
37
0
19 Jul 2024
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu-Xiang Wang
46
83
0
22 Apr 2024
A Survey on the Memory Mechanism of Large Language Model based Agents
Zeyu Zhang
Xiaohe Bo
Chen Ma
Rui Li
Xu Chen
Quanyu Dai
Jieming Zhu
Zhenhua Dong
Ji-Rong Wen
LLMAG
KELM
42
107
0
21 Apr 2024
Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models
Seungcheol Park
Ho-Jin Choi
U. Kang
VLM
37
5
0
07 Aug 2023
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Z. Yao
Xiaoxia Wu
Cheng-rong Li
Stephen Youn
Yuxiong He
MQ
63
57
0
15 Mar 2023
BiT: Robustly Binarized Multi-distilled Transformer
Zechun Liu
Barlas Oğuz
Aasish Pappu
Lin Xiao
Scott Yih
Meng Li
Raghuraman Krishnamoorthi
Yashar Mehdad
MQ
50
52
0
25 May 2022
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Torsten Hoefler
Dan Alistarh
Tal Ben-Nun
Nikoli Dryden
Alexandra Peste
MQ
141
684
0
31 Jan 2021
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
102
341
0
05 Jan 2021
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
142
221
0
31 Dec 2020
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding
Hao Fu
Shaojun Zhou
Qihong Yang
Junjie Tang
Guiquan Liu
Kaikui Liu
Xiaolong Li
37
57
0
14 Dec 2020
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
233
576
0
12 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
950
20,567
0
17 Apr 2017
A Decomposable Attention Model for Natural Language Inference
Ankur P. Parikh
Oscar Täckström
Dipanjan Das
Jakob Uszkoreit
207
1,367
0
06 Jun 2016
1