Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14152
Cited By
Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
23 May 2023
Jeonghoon Kim
J. H. Lee
Sungdong Kim
Joonsuk Park
Kang Min Yoo
S. Kwon
Dongsoo Lee
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization"
50 / 75 papers shown
Title
Diffusion Model Quantization: A Review
Qian Zeng
Chenggong Hu
Mingli Song
Jie Song
MQ
45
0
0
08 May 2025
Efficient Personalization of Quantized Diffusion Model without Backpropagation
H. Seo
Wongi Jeong
Kyungryeol Lee
Se Young Chun
DiffM
MQ
78
0
0
19 Mar 2025
ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning
Baohao Liao
Christian Herold
Seyyed Hadi Hashemi
Stefan Vasilev
Shahram Khadivi
Christof Monz
MQ
44
0
0
17 Mar 2025
Sample-aware Adaptive Structured Pruning for Large Language Models
Jun Kong
Xinge Ma
Jin Wang
Xuejie Zhang
56
0
0
08 Mar 2025
LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM
Yehonathan Refael
Iftach Arbel
Ofir Lindenbaum
Tom Tirer
68
0
0
26 Feb 2025
The Scaling Law for LoRA Base on Mutual Information Upper Bound
Jing Zhang
Hui Gao
Peng Zhang
Shuzhen Sun
Chang Yang
Yuexian Hou
28
0
0
06 Jan 2025
AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning
Yehonathan Refael
Jonathan Svirsky
Boris Shustin
Wasim Huleihel
Ofir Lindenbaum
34
3
0
31 Dec 2024
Pushing the Envelope of Low-Bit LLM via Dynamic Error Compensation
Y. Park
Jake Hyun
Hojoon Kim
Jae W. Lee
MQ
41
0
0
31 Dec 2024
Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference
Jorge García-Carrasco
A. Maté
Juan Trujillo
71
0
0
20 Dec 2024
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Haozhao Wang
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
119
1
0
18 Dec 2024
CPTQuant -- A Novel Mixed Precision Post-Training Quantization Techniques for Large Language Models
Amitash Nanda
Sree Bhargavi Balija
D. Sahoo
MQ
59
0
0
03 Dec 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
L. Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
49
3
0
24 Oct 2024
SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Jinda Jia
Cong Xie
Hanlin Lu
Daoce Wang
Hao Feng
...
Baixi Sun
Haibin Lin
Zhi-Li Zhang
Xin Liu
Dingwen Tao
MQ
30
4
0
20 Oct 2024
Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning
H. Fernando
Han Shen
Parikshit Ram
Yi Zhou
Horst Samulowitz
Nathalie Baracaldo
Tianyi Chen
CLL
50
2
0
20 Oct 2024
Understanding the Difficulty of Low-Precision Post-Training Quantization for LLMs
Zifei Xu
Sayeh Sharify
W. Yazar
T. Webb
Xin Eric Wang
MQ
38
0
0
18 Oct 2024
t-READi: Transformer-Powered Robust and Efficient Multimodal Inference for Autonomous Driving
Pengfei Hu
Yuhang Qian
Tianyue Zheng
Ang Li
Zhe Chen
Yue Gao
Xiuzhen Cheng
Jun-Jie Luo
26
0
0
13 Oct 2024
Propulsion: Steering LLM with Tiny Fine-Tuning
Md. Kowsher
Nusrat Jahan Prottasha
Prakash Bhat
38
4
0
17 Sep 2024
Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation
Jingjing Xie
Yuxin Zhang
Mingbao Lin
Liujuan Cao
Rongrong Ji
MQ
41
3
0
07 Aug 2024
Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations
Leo Donisch
Sigurd Schacht
Carsten Lanquillon
22
2
0
06 Aug 2024
Performance of Recent Large Language Models for a Low-Resourced Language
Ravindu Jayakody
Gihan Dias
26
5
0
31 Jul 2024
Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
Sanae Lotfi
Yilun Kuang
Brandon Amos
Micah Goldblum
Marc Finzi
Andrew Gordon Wilson
26
7
0
25 Jul 2024
Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance
Ao Shen
Qiang Wang
Zhiquan Lai
Xionglve Li
Dongsheng Li
ALM
MQ
27
1
0
24 Jul 2024
LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices
Jung Hyun Lee
Jeonghoon Kim
J. Yang
S. Kwon
Eunho Yang
Kang Min Yoo
Dongsoo Lee
MQ
36
2
0
16 Jul 2024
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients
Ajay Jaiswal
Lu Yin
Zhenyu (Allen) Zhang
Shiwei Liu
Jiawei Zhao
Yuandong Tian
Zhangyang Wang
33
14
0
15 Jul 2024
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Mengzhao Chen
Wenqi Shao
Peng Xu
Jiahao Wang
Peng Gao
Kaipeng Zhang
Yu Qiao
Ping Luo
MQ
38
24
0
10 Jul 2024
Towards Automating Text Annotation: A Case Study on Semantic Proximity Annotation using GPT-4
Sachin Yadav
Tejaswi Choppa
Dominik Schlechtweg
VLM
39
4
0
04 Jul 2024
Robust Adaptation of Foundation Models with Black-Box Visual Prompting
Changdae Oh
Gyeongdeok Seo
Geunyoung Jung
Zhi-Qi Cheng
Hosik Choi
Jiyoung Jung
Kyungwoo Song
VLM
38
1
0
04 Jul 2024
VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework
Zhi Yao
Zhiqing Tang
Jiong Lou
Ping Shen
Weijia Jia
40
7
0
19 Jun 2024
Low-Rank Quantization-Aware Training for LLMs
Yelysei Bondarenko
Riccardo Del Chiaro
Markus Nagel
MQ
33
10
0
10 Jun 2024
Federated LoRA with Sparse Communication
Kevin Kuo
Arian Raje
Kousik Rajesh
Virginia Smith
38
7
0
07 Jun 2024
Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning
Naibin Gu
Peng Fu
Xiyu Liu
Bowen Shen
Zheng-Shen Lin
Weiping Wang
38
6
0
06 Jun 2024
Achieving Sparse Activation in Small Language Models
Jifeng Song
Kai Huang
Xiangyu Yin
Boyuan Yang
Wei Gao
29
4
0
03 Jun 2024
LCQ: Low-Rank Codebook based Quantization for Large Language Models
Wen-Pu Cai
Wu-Jun Li
Wu-Jun Li
MQ
38
0
0
31 May 2024
To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability
Joonhyung Lee
Jeongin Bae
Byeongwook Kim
S. Kwon
Dongsoo Lee
MQ
41
0
0
29 May 2024
OAC: Output-adaptive Calibration for Accurate Post-training Quantization
Ali Edalati
Alireza Ghaffari
M. Asgharian
Lu Hou
Boxing Chen
Vahid Partovi Nia
V. Nia
MQ
78
0
0
23 May 2024
ReALLM: A general framework for LLM compression and fine-tuning
Louis Leconte
Lisa Bedin
Van Minh Nguyen
Eric Moulines
MQ
41
0
0
21 May 2024
HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding Models
Aakash Tripathi
Asim Waqas
Yasin Yilmaz
Ghulam Rasool
36
5
0
13 May 2024
Quantifying the Capabilities of LLMs across Scale and Precision
Sher Badshah
Hassan Sajjad
40
11
0
06 May 2024
How to Parameterize Asymmetric Quantization Ranges for Quantization-Aware Training
Jaeseong You
Minseop Park
Kyunggeun Lee
Seokjun An
Chirag I. Patel
Markus Nagel
MQ
41
1
0
25 Apr 2024
FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping
Ajay Jaiswal
Bodun Hu
Lu Yin
Yeonju Ro
Shiwei Liu
Tianlong Chen
Aditya Akella
58
12
0
05 Apr 2024
Minimize Quantization Output Error with Bias Compensation
Cheng Gong
Haoshuai Zheng
Mengting Hu
Zheng Lin
Deng-Ping Fan
Yuzhi Zhang
Tao Li
MQ
38
2
0
02 Apr 2024
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Zeyu Han
Chao Gao
Jinyang Liu
Jeff Zhang
Sai Qian Zhang
147
310
0
21 Mar 2024
DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
Sunghyeon Woo
Baeseong Park
Byeongwook Kim
Minjung Jo
S. Kwon
Dongsuk Jeon
Dongsoo Lee
57
2
0
27 Feb 2024
A Comprehensive Evaluation of Quantization Strategies for Large Language Models
Renren Jin
Jiangcun Du
Wuwei Huang
Wei Liu
Jian Luan
Bin Wang
Deyi Xiong
MQ
30
31
0
26 Feb 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
Zhihang Yuan
Yuzhang Shang
Yang Zhou
Zhen Dong
Zhe Zhou
...
Yong Jae Lee
Yan Yan
Beidi Chen
Guangyu Sun
Kurt Keutzer
37
79
0
26 Feb 2024
A Survey on Knowledge Distillation of Large Language Models
Xiaohan Xu
Ming Li
Chongyang Tao
Tao Shen
Reynold Cheng
Jinyang Li
Can Xu
Dacheng Tao
Tianyi Zhou
KELM
VLM
44
100
0
20 Feb 2024
OneBit: Towards Extremely Low-bit Large Language Models
Yuzhuang Xu
Xu Han
Zonghan Yang
Shuo Wang
Qingfu Zhu
Zhiyuan Liu
Weidong Liu
Wanxiang Che
MQ
51
36
0
17 Feb 2024
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Yeonhong Park
Jake Hyun
SangLyul Cho
Bonggeun Sim
Jae W. Lee
MQ
45
16
0
16 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
41
47
0
15 Feb 2024
Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
Haotong Qin
Xudong Ma
Xingyu Zheng
Xiaoyang Li
Yang Zhang
Shouda Liu
Jie Luo
Xianglong Liu
Michele Magno
MQ
23
37
0
08 Feb 2024
1
2
Next