ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.07633
  4. Cited By
A Survey on Model Compression for Large Language Models

A Survey on Model Compression for Large Language Models

15 August 2023
Xunyu Zhu
Jian Li
Yong Liu
Can Ma
Weiping Wang
ArXivPDFHTML

Papers citing "A Survey on Model Compression for Large Language Models"

50 / 143 papers shown
Title
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Yu Qiao
Huy Q. Le
Avi Deb Raha
Phuong-Nam Tran
Apurba Adhikary
Mengchun Zhang
Loc X. Nguyen
Eui-nam Huh
Dusit Niyato
C. Hong
AI4CE
31
0
0
11 May 2025
Stability in Single-Peaked Strategic Resource Selection Games
Stability in Single-Peaked Strategic Resource Selection Games
Henri Zeiler
32
3
0
09 May 2025
Sponge Attacks on Sensing AI: Energy-Latency Vulnerabilities and Defense via Model Pruning
Sponge Attacks on Sensing AI: Energy-Latency Vulnerabilities and Defense via Model Pruning
Syed Mhamudul Hasan
Hussein Zangoti
Iraklis Anagnostopoulos
Abdur R. Shahid
AAML
29
0
0
09 May 2025
SPAP: Structured Pruning via Alternating Optimization and Penalty Methods
SPAP: Structured Pruning via Alternating Optimization and Penalty Methods
Hanyu Hu
Xiaoming Yuan
48
0
0
06 May 2025
Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
Baoxia Du
H. Du
Dusit Niyato
Ruidong Li
58
0
0
05 May 2025
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
Sanjay Surendranath Girija
Shashank Kapoor
Lakshit Arora
Dipen Pradhan
Aman Raj
Ankit Shetgaonkar
54
0
0
05 May 2025
A Survey on Privacy Risks and Protection in Large Language Models
A Survey on Privacy Risks and Protection in Large Language Models
Kang Chen
Xiuze Zhou
Yuanguo Lin
Shibo Feng
Li Shen
Pengcheng Wu
AILaw
PILM
147
0
0
04 May 2025
ICQuant: Index Coding enables Low-bit LLM Quantization
ICQuant: Index Coding enables Low-bit LLM Quantization
Xinlin Li
Osama A. Hanna
Christina Fragouli
Suhas Diggavi
MQ
62
0
0
01 May 2025
Efficient LLMs with AMP: Attention Heads and MLP Pruning
Efficient LLMs with AMP: Attention Heads and MLP Pruning
Leandro Giusti Mugnaini
Bruno Yamamoto
Lucas Lauton de Alcantara
Victor Zacarias
Edson Bollis
Lucas Pellicer
A. H. R. Costa
Artur Jordao
47
0
0
29 Apr 2025
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
Fahmida Liza Piya
Rahmatollah Beheshti
131
0
0
23 Apr 2025
Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability
Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability
Daniel Hendriks
Philipp Spitzer
Niklas Kühl
G. Satzger
27
1
0
22 Apr 2025
ACoRN: Noise-Robust Abstractive Compression in Retrieval-Augmented Language Models
ACoRN: Noise-Robust Abstractive Compression in Retrieval-Augmented Language Models
Singon Kim
Gunho Jung
Seong-Whan Lee
RALM
33
0
0
17 Apr 2025
Résumé abstractif à partir dúne transcription audio
Résumé abstractif à partir dúne transcription audio
Ilia Derkach
39
0
0
16 Apr 2025
Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining
Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining
Deyu Cao
Samin Aref
MQ
27
0
0
14 Apr 2025
A Survey of Large Language Models in Mental Health Disorder Detection on Social Media
A Survey of Large Language Models in Mental Health Disorder Detection on Social Media
Zhuohan Ge
Nicole Hu
Darian Li
Yubo Wang
Shihao Qi
Yuming Xu
Han Shi
J. Zhang
AI4MH
63
0
0
03 Apr 2025
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
Nan Zhang
Yusen Zhang
Prasenjit Mitra
Rui Zhang
MQ
LRM
51
2
0
02 Apr 2025
Model Hemorrhage and the Robustness Limits of Large Language Models
Model Hemorrhage and the Robustness Limits of Large Language Models
Ziyang Ma
Z. Li
L. Zhang
Gui-Song Xia
Bo Du
Liangpei Zhang
Dacheng Tao
59
0
0
31 Mar 2025
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Hung-Yueh Chiang
Chi-chih Chang
N. Frumkin
Kai-Chiang Wu
Mohamed S. Abdelfattah
Diana Marculescu
MQ
143
0
0
28 Mar 2025
Multi-Task Semantic Communications via Large Models
Multi-Task Semantic Communications via Large Models
Wanli Ni
Zhijin Qin
Haofeng Sun
Xiaoming Tao
Zhu Han
35
0
0
28 Mar 2025
Autonomous Radiotherapy Treatment Planning Using DOLA: A Privacy-Preserving, LLM-Based Optimization Agent
Autonomous Radiotherapy Treatment Planning Using DOLA: A Privacy-Preserving, LLM-Based Optimization Agent
Humza Nusrat
Bing Luo
Ryan Hall
Joshua Kim
H. Bagher-Ebadian
Anthony Doemer
B. Movsas
Kundan Thind
AI4CE
34
0
0
21 Mar 2025
A Time Series Multitask Framework Integrating a Large Language Model, Pre-Trained Time Series Model, and Knowledge Graph
Shule Hao
Junpeng Bao
Chuncheng Lu
AI4TS
KELM
49
0
0
10 Mar 2025
TR-DQ: Time-Rotation Diffusion Quantization
Yihua Shao
Deyang Lin
Fanhu Zeng
Minxi Yan
M. Zhang
...
Haozhe Wang
J. Guo
Yan Wang
Haotong Qin
Hao Tang
MQ
DiffM
77
1
0
09 Mar 2025
Towards Explainable Doctor Recommendation with Large Language Models
Ziyang Zeng
Dongyuan Li
Yuqing Yang
LM&MA
AI4TS
63
0
0
04 Mar 2025
LLM Inference Acceleration via Efficient Operation Fusion
LLM Inference Acceleration via Efficient Operation Fusion
Mahsa Salmani
I. Soloveychik
64
0
0
24 Feb 2025
Hardware-Friendly Static Quantization Method for Video Diffusion Transformers
Hardware-Friendly Static Quantization Method for Video Diffusion Transformers
Sanghyun Yi
Qingfeng Liu
Mostafa El-Khamy
MQ
VGen
41
0
0
20 Feb 2025
The Impact of Inference Acceleration on Bias of LLMs
The Impact of Inference Acceleration on Bias of LLMs
Elisabeth Kirsten
Ivan Habernal
Vedant Nanda
Muhammad Bilal Zafar
36
0
0
20 Feb 2025
Refine Knowledge of Large Language Models via Adaptive Contrastive Learning
Refine Knowledge of Large Language Models via Adaptive Contrastive Learning
Yinghui Li
Haojing Huang
Jiayi Kuang
Yangning Li
Shu Guo
C. Qu
Xiaoyu Tan
Hai-Tao Zheng
Ying Shen
Philip S. Yu
CLL
66
5
0
11 Feb 2025
Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study
Eric Aubinais
Philippe Formont
Pablo Piantanida
Elisabeth Gassiat
47
0
0
10 Feb 2025
Foundation Model-Based Apple Ripeness and Size Estimation for Selective Harvesting
Foundation Model-Based Apple Ripeness and Size Estimation for Selective Harvesting
Keyi Zhu
Jiajia Li
Kaixiang Zhang
Chaaran Arunachalam
Siddhartha Bhattacharya
R. Lu
Zhaojian Li
85
0
0
03 Feb 2025
Choose Your Model Size: Any Compression by a Single Gradient Descent
Choose Your Model Size: Any Compression by a Single Gradient Descent
Martin Genzel
Patrick Putzky
Pengfei Zhao
S.
Mattes Mollenhauer
Robert Seidel
Stefan Dietzel
Thomas Wollmann
41
0
0
03 Feb 2025
Symmetric Pruning of Large Language Models
Symmetric Pruning of Large Language Models
Kai Yi
Peter Richtárik
AAML
VLM
57
0
0
31 Jan 2025
An Invitation to Neuroalgebraic Geometry
An Invitation to Neuroalgebraic Geometry
G. Marchetti
V. Shahverdi
Stefano Mereta
Matthew Trager
Kathlén Kohn
111
2
0
31 Jan 2025
Rethinking Post-Training Quantization: Introducing a Statistical Pre-Calibration Approach
Rethinking Post-Training Quantization: Introducing a Statistical Pre-Calibration Approach
Alireza Ghaffari
Sharareh Younesian
Boxing Chen
Vahid Partovi Nia
M. Asgharian
MQ
61
0
0
17 Jan 2025
Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models
Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models
Hao Li
C. Bezemer
Ahmed E. Hassan
45
2
0
08 Jan 2025
Pushing the Envelope of Low-Bit LLM via Dynamic Error Compensation
Pushing the Envelope of Low-Bit LLM via Dynamic Error Compensation
Y. Park
Jake Hyun
Hojoon Kim
Jae W. Lee
MQ
41
0
0
31 Dec 2024
Deploying Foundation Model Powered Agent Services: A Survey
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Haozhao Wang
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
119
1
0
18 Dec 2024
PIM-AI: A Novel Architecture for High-Efficiency LLM Inference
PIM-AI: A Novel Architecture for High-Efficiency LLM Inference
Cristobal Ortega
Yann Falevoz
Renaud Ayrignac
83
1
0
26 Nov 2024
Scaling Laws for Precision
Scaling Laws for Precision
Tanishq Kumar
Zachary Ankner
Benjamin Spector
Blake Bordelon
Niklas Muennighoff
Mansheej Paul
C. Pehlevan
Christopher Ré
Aditi Raghunathan
AIFin
MoMe
46
13
0
07 Nov 2024
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
Shih-yang Liu
Huck Yang
Nai Chit Fung
Nai Chit Fung
Hongxu Yin
...
Jan Kautz
Yu-Chun Wang
Pavlo Molchanov
Min-Hung Chen
Min-Hung Chen
MQ
31
0
0
28 Oct 2024
Catastrophic Failure of LLM Unlearning via Quantization
Catastrophic Failure of LLM Unlearning via Quantization
Zhiwei Zhang
Fali Wang
Xiaomin Li
Zongyu Wu
Xianfeng Tang
Hui Liu
Qi He
Wenpeng Yin
Suhang Wang
MU
34
5
0
21 Oct 2024
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large
  Language Models
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models
Qitan Lv
Jie Wang
Hanzhu Chen
Bin Li
Yongdong Zhang
Feng Wu
HILM
25
3
0
19 Oct 2024
Evaluating Quantized Large Language Models for Code Generation on
  Low-Resource Language Benchmarks
Evaluating Quantized Large Language Models for Code Generation on Low-Resource Language Benchmarks
Enkhbold Nyamsuren
MQ
21
1
0
18 Oct 2024
PrivQuant: Communication-Efficient Private Inference with Quantized
  Network/Protocol Co-Optimization
PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization
Tianshi Xu
Shuzhang Zhong
Wenxuan Zeng
Runsheng Wang
Meng Li
MQ
31
0
0
12 Oct 2024
CrossQuant: A Post-Training Quantization Method with Smaller
  Quantization Kernel for Precise Large Language Model Compression
CrossQuant: A Post-Training Quantization Method with Smaller Quantization Kernel for Precise Large Language Model Compression
Wenyuan Liu
Xindian Ma
Peng Zhang
Yan Wang
MQ
29
1
0
10 Oct 2024
Accelerating Error Correction Code Transformers
Accelerating Error Correction Code Transformers
Matan Levy
Yoni Choukroun
Lior Wolf
MQ
21
0
0
08 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
39
3
0
08 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
62
16
0
06 Oct 2024
The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems
The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems
Linke Song
Zixuan Pang
Wenhao Wang
Zihao Wang
XiaoFeng Wang
Hongbo Chen
Wei Song
Yier Jin
Dan Meng
Rui Hou
56
7
0
30 Sep 2024
Hyper-Compression: Model Compression via Hyperfunction
Hyper-Compression: Model Compression via Hyperfunction
Fenglei Fan
Juntong Fan
Dayang Wang
Jingbo Zhang
Zelin Dong
Shijun Zhang
Ge Wang
Tieyong Zeng
24
0
0
01 Sep 2024
MoDeGPT: Modular Decomposition for Large Language Model Compression
MoDeGPT: Modular Decomposition for Large Language Model Compression
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
71
6
0
19 Aug 2024
123
Next