ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.02272
  4. Cited By
OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and
  Inference of Large Language Models

OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models

4 June 2023
Changhun Lee
Jungyu Jin
Taesu Kim
Hyungjun Kim
Eunhyeok Park
    MQ
ArXivPDFHTML

Papers citing "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models"

35 / 35 papers shown
Title
Semantic Retention and Extreme Compression in LLMs: Can We Have Both?
Semantic Retention and Extreme Compression in LLMs: Can We Have Both?
Stanislas Laborde
Martin Cousseau
Antoun Yaacoub
Lionel Prevost
MQ
23
0
0
12 May 2025
Radio: Rate-Distortion Optimization for Large Language Model Compression
Radio: Rate-Distortion Optimization for Large Language Model Compression
Sean I. Young
MQ
26
0
0
05 May 2025
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Haozheng Luo
Chenghao Qiu
Maojiang Su
Zhihan Zhou
Zoe Mehta
Guo Ye
Jerry Yao-Chieh Hu
Han Liu
AAML
55
1
0
01 May 2025
FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs
FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs
Xilong Xie
Liang Wang
Limin Xiao
Meng Han
Lin Sun
S. Zheng
Xiangrong Xu
MQ
31
0
0
28 Apr 2025
Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining
Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining
Deyu Cao
Samin Aref
MQ
27
0
0
14 Apr 2025
RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm
RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm
Yongyi Yang
Jianyang Gao
Wei Hu
MQ
36
1
0
29 Mar 2025
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
Minsu Kim
Seongmin Hong
RyeoWook Ko
S. Choi
Hunjong Lee
Junsoo Kim
Joo-Young Kim
Jongse Park
57
0
0
24 Mar 2025
HALO: Hardware-aware quantization with low critical-path-delay weights for LLM acceleration
HALO: Hardware-aware quantization with low critical-path-delay weights for LLM acceleration
Rohan Juneja
Shivam Aggarwal
Safeen Huda
Tulika Mitra
L. Peh
50
0
0
27 Feb 2025
Hardware-Friendly Static Quantization Method for Video Diffusion Transformers
Hardware-Friendly Static Quantization Method for Video Diffusion Transformers
Sanghyun Yi
Qingfeng Liu
Mostafa El-Khamy
MQ
VGen
41
0
0
20 Feb 2025
Optimizing Large Language Model Training Using FP4 Quantization
Optimizing Large Language Model Training Using FP4 Quantization
Ruizhe Wang
Yeyun Gong
Xiao Liu
Guoshuai Zhao
Ziyue Yang
Baining Guo
Zhengjun Zha
Peng Cheng
MQ
67
7
0
28 Jan 2025
Irrational Complex Rotations Empower Low-bit Optimizers
Irrational Complex Rotations Empower Low-bit Optimizers
Zhen Tian
Wayne Xin Zhao
Zhicheng Dou
MQ
46
0
0
22 Jan 2025
Pushing the Envelope of Low-Bit LLM via Dynamic Error Compensation
Pushing the Envelope of Low-Bit LLM via Dynamic Error Compensation
Y. Park
Jake Hyun
Hojoon Kim
Jae W. Lee
MQ
46
0
0
31 Dec 2024
PTQ4VM: Post-Training Quantization for Visual Mamba
PTQ4VM: Post-Training Quantization for Visual Mamba
Younghyun Cho
Changhun Lee
Seonggon Kim
Eunhyeok Park
MQ
Mamba
46
2
0
29 Dec 2024
The Super Weight in Large Language Models
The Super Weight in Large Language Models
Mengxia Yu
De Wang
Qi Shan
Colorado Reed
Alvin Wan
MQ
MILM
42
10
0
11 Nov 2024
Understanding the Difficulty of Low-Precision Post-Training Quantization for LLMs
Understanding the Difficulty of Low-Precision Post-Training Quantization for LLMs
Zifei Xu
Sayeh Sharify
W. Yazar
T. Webb
Xin Wang
MQ
43
0
0
18 Oct 2024
FiRST: Finetuning Router-Selective Transformers for Input-Adaptive
  Latency Reduction
FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction
Akriti Jain
Saransh Sharma
Koyel Mukherjee
Soumyabrata Pal
31
1
0
16 Oct 2024
Channel-Wise Mixed-Precision Quantization for Large Language Models
Channel-Wise Mixed-Precision Quantization for Large Language Models
Zihan Chen
Bike Xie
Jundong Li
Cong Shen
MQ
39
2
0
16 Oct 2024
Scaling laws for post-training quantized large language models
Scaling laws for post-training quantized large language models
Zifei Xu
Alexander Lan
W. Yazar
T. Webb
Sayeh Sharify
Xin Wang
MQ
35
0
0
15 Oct 2024
QSpec: Speculative Decoding with Complementary Quantization Schemes
QSpec: Speculative Decoding with Complementary Quantization Schemes
Juntao Zhao
Wenhao Lu
Sheng Wang
Lingpeng Kong
Chuan Wu
MQ
74
5
0
15 Oct 2024
ARB-LLM: Alternating Refined Binarizations for Large Language Models
ARB-LLM: Alternating Refined Binarizations for Large Language Models
Zhiteng Li
Xinyu Yan
Tianao Zhang
Haotong Qin
Dong Xie
Jiang Tian
Zhongchao Shi
Linghe Kong
Yulun Zhang
Xiaokang Yang
MQ
37
2
0
04 Oct 2024
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large
  Language Models
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models
Yifei Liu
Jicheng Wen
Yang Wang
Shengyu Ye
Li Lyna Zhang
Ting Cao
Cheng Li
Mao Yang
MQ
108
10
0
25 Sep 2024
OPAL: Outlier-Preserved Microscaling Quantization Accelerator for
  Generative Large Language Models
OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models
Jahyun Koo
Dahoon Park
Sangwoo Jung
Jaeha Kung
MQ
26
0
0
06 Sep 2024
Foundations of Large Language Model Compression -- Part 1: Weight
  Quantization
Foundations of Large Language Model Compression -- Part 1: Weight Quantization
Sean I. Young
MQ
50
1
0
03 Sep 2024
Inference Optimizations for Large Language Models: Effects, Challenges,
  and Practical Considerations
Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations
Leo Donisch
Sigurd Schacht
Carsten Lanquillon
30
2
0
06 Aug 2024
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Peijie Dong
Lujun Li
Dayou Du
Yuhan Chen
Zhenheng Tang
...
Wei Xue
Wenhan Luo
Qi-fei Liu
Yi-Ting Guo
Xiaowen Chu
MQ
58
4
0
03 Aug 2024
Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for
  Large Language Models
Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models
Dongwon Jo
Taesu Kim
Yulhwa Kim
Jae-Joon Kim
52
3
0
18 Jun 2024
Low-Rank Quantization-Aware Training for LLMs
Low-Rank Quantization-Aware Training for LLMs
Yelysei Bondarenko
Riccardo Del Chiaro
Markus Nagel
MQ
33
10
0
10 Jun 2024
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning
Yibo Yang
Xiaojie Li
Zhongzhu Zhou
Shuaiwen Leon Song
Jianlong Wu
Liqiang Nie
Guohao Li
45
6
0
07 Jun 2024
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Burcu Harma
Ayan Chakraborty
Elizaveta Kostenok
Danila Mishin
Dongho Ha
...
Martin Jaggi
Ming Liu
Yunho Oh
Suvinay Subramanian
Amir Yazdanbakhsh
MQ
44
6
0
31 May 2024
CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs
CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs
Haoyu Wang
Bei Liu
Hang Shao
Bo Xiao
Ke Zeng
Guanglu Wan
Yanmin Qian
MQ
31
0
0
27 May 2024
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM
  Compression
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
Vladimir Malinovskii
Denis Mazur
Ivan Ilin
Denis Kuznedelev
Konstantin Burlachenko
Kai Yi
Dan Alistarh
Peter Richtárik
MQ
37
19
0
23 May 2024
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Yeonhong Park
Jake Hyun
SangLyul Cho
Bonggeun Sim
Jae W. Lee
MQ
45
16
0
16 Feb 2024
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for
  efficient LLM inference
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
Taesu Kim
Jongho Lee
Daehyun Ahn
Sarang Kim
Jiwoong Choi
Minkyu Kim
Hyungjun Kim
22
2
0
15 Feb 2024
Beyond the Limits: A Survey of Techniques to Extend the Context Length
  in Large Language Models
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
Xindi Wang
Mahsa Salmani
Parsa Omidi
Xiangyu Ren
Mehdi Rezagholizadeh
A. Eshaghi
LRM
34
35
0
03 Feb 2024
A Survey on Model Compression for Large Language Models
A Survey on Model Compression for Large Language Models
Xunyu Zhu
Jian Li
Yong Liu
Can Ma
Weiping Wang
36
193
0
15 Aug 2023
1