Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.05015
Cited By
Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models
8 October 2023
Song Guo
Jiahang Xu
Li Zhang
Mao Yang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models"
21 / 21 papers shown
Title
CUT: Pruning Pre-Trained Multi-Task Models into Compact Models for Edge Devices
Jingxuan Zhou
Weidong Bao
Ji Wang
Zhengyi Zhong
32
0
0
14 Apr 2025
Dynamic Low-Rank Sparse Adaptation for Large Language Models
Weizhong Huang
Yuxin Zhang
Xiawu Zheng
Yong-Jin Liu
Jing Lin
Yiwu Yao
Rongrong Ji
97
1
0
21 Feb 2025
SlimGPT: Layer-wise Structured Pruning for Large Language Models
Gui Ling
Ziyang Wang
Yuliang Yan
Qingwen Liu
38
2
0
24 Dec 2024
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Yining Qi
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
119
1
0
18 Dec 2024
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design
Ruisi Cai
Yeonju Ro
Geon-Woo Kim
Peihao Wang
Babak Ehteshami Bejnordi
Aditya Akella
Zhilin Wang
MoE
45
4
0
24 Oct 2024
Beware of Calibration Data for Pruning Large Language Models
Yixin Ji
Yang Xiang
Juntao Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
42
2
0
23 Oct 2024
On-Device LLMs for SMEs: Challenges and Opportunities
Jeremy Stephen Gabriel Yee
Pai Chet Ng
Zhengkui Wang
Ian McLoughlin
Aik Beng Ng
Simon See
29
1
0
21 Oct 2024
SparseDM: Toward Sparse Efficient Diffusion Models
Kafeng Wang
Jianfei Chen
He Li
Zhenpeng Mi
Jun-Jie Zhu
DiffM
70
8
0
16 Apr 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
Zhihang Yuan
Yuzhang Shang
Yang Zhou
Zhen Dong
Zhe Zhou
...
Yong Jae Lee
Yan Yan
Beidi Chen
Guangyu Sun
Kurt Keutzer
58
82
0
26 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
41
48
0
15 Feb 2024
Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes
Lucio Dery
Steven Kolawole
Jean-Francois Kagey
Virginia Smith
Graham Neubig
Ameet Talwalkar
52
28
0
08 Feb 2024
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
171
591
0
06 Apr 2023
Transkimmer: Transformer Learns to Layer-wise Skim
Yue Guan
Zhengyi Li
Jingwen Leng
Zhouhan Lin
Minyi Guo
80
38
0
15 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
384
12,081
0
04 Mar 2022
Cyclical Pruning for Sparse Neural Networks
Suraj Srinivas
Andrey Kuzmin
Markus Nagel
M. V. Baalen
Andrii Skliar
Tijmen Blankevoort
43
13
0
02 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
447
8,699
0
28 Jan 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
218
1,664
0
15 Oct 2021
Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks
Itay Hubara
Brian Chmiel
Moshe Island
Ron Banner
S. Naor
Daniel Soudry
59
111
0
16 Feb 2021
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
107
345
0
05 Jan 2021
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
258
1,591
0
21 Jan 2020
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
236
578
0
12 Sep 2019
1