ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.13137
  4. Cited By
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language
  Models

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

25 August 2023
Wenqi Shao
Yonghong Tian
Zhaoyang Zhang
Peng Xu
Lirui Zhao
Zhiqiang Li
Kaipeng Zhang
Peng Gao
Yu Qiao
Ping Luo
    MQ
ArXivPDFHTML

Papers citing "OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models"

50 / 139 papers shown
Title
QSpec: Speculative Decoding with Complementary Quantization Schemes
QSpec: Speculative Decoding with Complementary Quantization Schemes
Juntao Zhao
Wenhao Lu
Sheng Wang
Lingpeng Kong
Chuan Wu
MQ
71
5
0
15 Oct 2024
Skipping Computations in Multimodal LLMs
Skipping Computations in Multimodal LLMs
Mustafa Shukor
Matthieu Cord
26
2
0
12 Oct 2024
FlatQuant: Flatness Matters for LLM Quantization
FlatQuant: Flatness Matters for LLM Quantization
Yuxuan Sun
Ruikang Liu
Haoli Bai
Han Bao
Kang Zhao
...
Lu Hou
Chun Yuan
Xin Jiang
Wei Liu
Jun Yao
MQ
79
4
0
12 Oct 2024
KV Prediction for Improved Time to First Token
KV Prediction for Improved Time to First Token
Maxwell Horton
Qingqing Cao
Chenfan Sun
Yanzi Jin
Sachin Mehta
Mohammad Rastegari
Moin Nabi
AI4TS
37
1
0
10 Oct 2024
CrossQuant: A Post-Training Quantization Method with Smaller
  Quantization Kernel for Precise Large Language Model Compression
CrossQuant: A Post-Training Quantization Method with Smaller Quantization Kernel for Precise Large Language Model Compression
Wenyuan Liu
Xindian Ma
Peng Zhang
Yan Wang
MQ
29
1
0
10 Oct 2024
Scaling Laws for Mixed quantization in Large Language Models
Scaling Laws for Mixed quantization in Large Language Models
Zeyu Cao
Cheng Zhang
Pedro Gimenes
Jianqiao Lu
Jianyi Cheng
Yiren Zhao
MQ
33
1
0
09 Oct 2024
QERA: an Analytical Framework for Quantization Error Reconstruction
QERA: an Analytical Framework for Quantization Error Reconstruction
Cheng Zhang
Jeffrey T. H. Wong
Can Xiao
George A. Constantinides
Yiren Zhao
MQ
47
2
0
08 Oct 2024
QT-DoG: Quantization-aware Training for Domain Generalization
QT-DoG: Quantization-aware Training for Domain Generalization
Saqib Javed
Hieu Le
Mathieu Salzmann
OOD
MQ
30
1
0
08 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
39
3
0
08 Oct 2024
Continuous Approximations for Improving Quantization Aware Training of
  LLMs
Continuous Approximations for Improving Quantization Aware Training of LLMs
He Li
Jianhang Hong
Yuanzhuo Wu
Snehal Adbol
Zonglin Li
MQ
26
1
0
06 Oct 2024
Accumulator-Aware Post-Training Quantization
Accumulator-Aware Post-Training Quantization
Ian Colbert
Fabian Grob
Giuseppe Franco
Jinjie Zhang
Rayan Saab
MQ
30
3
0
25 Sep 2024
DilateQuant: Accurate and Efficient Diffusion Quantization via Weight
  Dilation
DilateQuant: Accurate and Efficient Diffusion Quantization via Weight Dilation
Xuewen Liu
Zhikai Li
Qingyi Gu
MQ
34
4
0
22 Sep 2024
Foundations of Large Language Model Compression -- Part 1: Weight
  Quantization
Foundations of Large Language Model Compression -- Part 1: Weight Quantization
Sean I. Young
MQ
48
1
0
03 Sep 2024
Matmul or No Matmal in the Era of 1-bit LLMs
Matmul or No Matmal in the Era of 1-bit LLMs
Jinendra Malekar
Mohammed E. Elbtity
Ramtin Zand
MQ
32
2
0
21 Aug 2024
ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large
  Language Models
ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models
Chao Zeng
Songwei Liu
Yusheng Xie
Hong Liu
Xiaojian Wang
Miao Wei
Shu Yang
Fangmin Chen
Xing Mei
MQ
42
6
0
16 Aug 2024
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large
  Language Models
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
Zhongyu Zhao
Menghang Dong
Rongyu Zhang
Wenzhao Zheng
Yunpeng Zhang
Huanrui Yang
Dalong Du
Kurt Keutzer
Shanghang Zhang
51
0
0
15 Aug 2024
Inference Optimizations for Large Language Models: Effects, Challenges,
  and Practical Considerations
Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations
Leo Donisch
Sigurd Schacht
Carsten Lanquillon
30
2
0
06 Aug 2024
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Peijie Dong
Lujun Li
Dayou Du
Yuhan Chen
Zhenheng Tang
...
Wei Xue
Wenhan Luo
Qi-fei Liu
Yi-Ting Guo
Xiaowen Chu
MQ
50
4
0
03 Aug 2024
Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache
  Consumption
Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption
Shi Luohe
Hongyi Zhang
Yao Yao
Z. Li
Zhao Hai
31
33
0
25 Jul 2024
Inverted Activations
Inverted Activations
Georgii Sergeevich Novikov
Ivan Oseledets
21
0
0
22 Jul 2024
A Survey on Symbolic Knowledge Distillation of Large Language Models
A Survey on Symbolic Knowledge Distillation of Large Language Models
Kamal Acharya
Alvaro Velasquez
Haoze Song
SyDa
44
5
0
12 Jul 2024
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Yonghong Tian
Wenqi Shao
Peng Xu
Jiahao Wang
Peng Gao
Kaipeng Zhang
Ping Luo
MQ
46
24
0
10 Jul 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
42
43
0
09 Jul 2024
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
Xingrun Xing
Boyan Gao
Zheng Zhang
David A. Clifton
Shitao Xiao
Li Du
Guoqi Li
Jiajun Zhang
55
5
0
05 Jul 2024
RankAdaptor: Hierarchical Dynamic Low-Rank Adaptation for Structural
  Pruned LLMs
RankAdaptor: Hierarchical Dynamic Low-Rank Adaptation for Structural Pruned LLMs
Changhai Zhou
Shijie Han
Shiyang Zhang
Shichao Weng
Zekai Liu
Cheng Jin
45
1
0
22 Jun 2024
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
Ke Cheng
Wen Hu
Zhi Wang
Hongen Peng
Jianguo Li
Sheng Zhang
57
7
0
19 Jun 2024
BoA: Attention-aware Post-training Quantization without Backpropagation
BoA: Attention-aware Post-training Quantization without Backpropagation
Junhan Kim
Ho-Young Kim
Eulrang Cho
Chungman Lee
Joonyoung Kim
Yongkweon Jeon
MQ
38
0
0
19 Jun 2024
Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for
  Large Language Models
Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models
Dongwon Jo
Taesu Kim
Yulhwa Kim
Jae-Joon Kim
52
3
0
18 Jun 2024
QTIP: Quantization with Trellises and Incoherence Processing
QTIP: Quantization with Trellises and Incoherence Processing
Albert Tseng
Qingyao Sun
David Hou
Christopher De Sa
MQ
40
10
0
17 Jun 2024
Evaluating the Generalization Ability of Quantized LLMs: Benchmark,
  Analysis, and Toolbox
Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox
Yijun Liu
Yuan Meng
Fang Wu
Shenhao Peng
Hang Yao
Chaoyu Guan
Chen Tang
Xinzhu Ma
Zhi Wang
Wenwu Zhu
MQ
58
7
0
15 Jun 2024
TernaryLLM: Ternarized Large Language Model
TernaryLLM: Ternarized Large Language Model
Tianqi Chen
Zhe Li
Weixiang Xu
Zeyu Zhu
Dong Li
Lu Tian
E. Barsoum
Peisong Wang
Jian Cheng
36
7
0
11 Jun 2024
Low-Rank Quantization-Aware Training for LLMs
Low-Rank Quantization-Aware Training for LLMs
Yelysei Bondarenko
Riccardo Del Chiaro
Markus Nagel
MQ
33
10
0
10 Jun 2024
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training
  Multiplication-Less Reparameterization
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Haoran You
Yipin Guo
Yichao Fu
Wei Zhou
Huihong Shi
Xiaofan Zhang
Souvik Kundu
Amir Yazdanbakhsh
Y. Lin
KELM
56
7
0
10 Jun 2024
Enabling Efficient Batch Serving for LMaaS via Generation Length
  Prediction
Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction
Ke Cheng
Wen Hu
Zhi Wang
Peng Du
Jianguo Li
Sheng Zhang
42
10
0
07 Jun 2024
MagR: Weight Magnitude Reduction for Enhancing Post-Training
  Quantization
MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization
Aozhong Zhang
Naigang Wang
Yanxia Deng
Xin Li
Zi Yang
Penghang Yin
MQ
37
5
0
02 Jun 2024
LCQ: Low-Rank Codebook based Quantization for Large Language Models
LCQ: Low-Rank Codebook based Quantization for Large Language Models
Wen-Pu Cai
Wu-Jun Li
Wu-Jun Li
MQ
46
0
0
31 May 2024
STAT: Shrinking Transformers After Training
STAT: Shrinking Transformers After Training
Megan Flynn
Alexander Wang
Dean Edward Alvarez
Christopher De Sa
Anil Damle
36
2
0
29 May 2024
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit
  Large Language Models
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
Xing Hu
Yuan Cheng
Dawei Yang
Zhihang Yuan
Jiangyong Yu
Chen Xu
Sifan Zhou
MQ
36
7
0
28 May 2024
CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs
CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs
Haoyu Wang
Bei Liu
Hang Shao
Bo Xiao
Ke Zeng
Guanglu Wan
Yanmin Qian
MQ
29
0
0
27 May 2024
LoQT: Low Rank Adapters for Quantized Training
LoQT: Low Rank Adapters for Quantized Training
Sebastian Loeschcke
M. Toftrup
M. Kastoryano
Serge Belongie
Vésteinn Snæbjarnarson
MQ
42
0
0
26 May 2024
PTQ4DiT: Post-training Quantization for Diffusion Transformers
PTQ4DiT: Post-training Quantization for Diffusion Transformers
Junyi Wu
Haoxuan Wang
Yuzhang Shang
Mubarak Shah
Yan Yan
MQ
33
19
0
25 May 2024
Athena: Efficient Block-Wise Post-Training Quantization for Large
  Language Models Using Second-Order Matrix Derivative Information
Athena: Efficient Block-Wise Post-Training Quantization for Large Language Models Using Second-Order Matrix Derivative Information
Yanshu Wang
Wenyang He
Tong Yang
MQ
15
1
0
24 May 2024
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM
  Compression
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
Vladimir Malinovskii
Denis Mazur
Ivan Ilin
Denis Kuznedelev
Konstantin Burlachenko
Kai Yi
Dan Alistarh
Peter Richtárik
MQ
37
19
0
23 May 2024
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large
  Language Models
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
Wei Huang
Haotong Qin
Yangdong Liu
Yawei Li
Xianglong Liu
Luca Benini
Michele Magno
Xiaojuan Qi
MQ
67
15
0
23 May 2024
Integer Scale: A Free Lunch for Faster Fine-grained Quantization of LLMs
Integer Scale: A Free Lunch for Faster Fine-grained Quantization of LLMs
Qingyuan Li
Ran Meng
Yiduo Li
Bo Zhang
Yifan Lu
Yerui Sun
Lin Ma
Yuchen Xie
MQ
38
0
0
23 May 2024
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based
  LLMs
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
Jaewoo Yang
Hayun Kim
Younghoon Kim
47
12
0
23 May 2024
OAC: Output-adaptive Calibration for Accurate Post-training Quantization
OAC: Output-adaptive Calibration for Accurate Post-training Quantization
Ali Edalati
Alireza Ghaffari
M. Asgharian
Lu Hou
Boxing Chen
Vahid Partovi Nia
V. Nia
MQ
86
0
0
23 May 2024
ReALLM: A general framework for LLM compression and fine-tuning
ReALLM: A general framework for LLM compression and fine-tuning
Louis Leconte
Lisa Bedin
Van Minh Nguyen
Eric Moulines
MQ
41
0
0
21 May 2024
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language
  Models
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
Haojie Duanmu
Zhihang Yuan
Xiuhong Li
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQ
42
19
0
10 May 2024
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Yujun Lin
Haotian Tang
Shang Yang
Zhekai Zhang
Guangxuan Xiao
Chuang Gan
Song Han
90
76
0
07 May 2024
Previous
123
Next