ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.01861
  4. Cited By
ZeroQuant: Efficient and Affordable Post-Training Quantization for
  Large-Scale Transformers

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

4 June 2022
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
    VLM
    MQ
ArXivPDFHTML

Papers citing "ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers"

50 / 324 papers shown
Title
Scaling Laws for Sparsely-Connected Foundation Models
Scaling Laws for Sparsely-Connected Foundation Models
Elias Frantar
C. Riquelme
N. Houlsby
Dan Alistarh
Utku Evci
33
35
0
15 Sep 2023
DeViT: Decomposing Vision Transformers for Collaborative Inference in
  Edge Devices
DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices
Guanyu Xu
Zhiwei Hao
Yong Luo
Han Hu
J. An
Shiwen Mao
ViT
37
14
0
10 Sep 2023
LLMCad: Fast and Scalable On-device Large Language Model Inference
LLMCad: Fast and Scalable On-device Large Language Model Inference
Daliang Xu
Wangsong Yin
Xin Jin
Wenjie Qu
Shiyun Wei
Mengwei Xu
Xuanzhe Liu
22
44
0
08 Sep 2023
Norm Tweaking: High-performance Low-bit Quantization of Large Language
  Models
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
Liang Li
Qingyuan Li
Bo-Wen Zhang
Xiangxiang Chu
MQ
44
29
0
06 Sep 2023
QuantEase: Optimization-based Quantization for Language Models
QuantEase: Optimization-based Quantization for Language Models
Kayhan Behdin
Ayan Acharya
Aman Gupta
Qingquan Song
Siyu Zhu
S. Keerthi
Rahul Mazumder
MQ
30
20
0
05 Sep 2023
Memory Efficient Optimizers with 4-bit States
Memory Efficient Optimizers with 4-bit States
Bingrui Li
Jianfei Chen
Jun Zhu
MQ
30
33
0
04 Sep 2023
FPTQ: Fine-grained Post-Training Quantization for Large Language Models
FPTQ: Fine-grained Post-Training Quantization for Large Language Models
Qingyuan Li
Yifan Zhang
Liang Li
Peng Yao
Bo-Wen Zhang
Xiangxiang Chu
Yerui Sun
Li-Qiang Du
Yuchen Xie
MQ
45
12
0
30 Aug 2023
Uncovering the Hidden Cost of Model Compression
Uncovering the Hidden Cost of Model Compression
Diganta Misra
Muawiz Chaudhary
Agam Goyal
Bharat Runwal
Pin-Yu Chen
VLM
38
0
0
29 Aug 2023
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on
  Language, Multimodal, and Scientific GPT Models
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and Scientific GPT Models
Kaiyuan Gao
Su He
Zhenyu He
Jiacheng Lin
Qizhi Pei
Jie Shao
Wei Zhang
LM&MA
SyDa
38
4
0
27 Aug 2023
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language
  Models
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
Wenqi Shao
Yonghong Tian
Zhaoyang Zhang
Peng Xu
Lirui Zhao
Zhiqiang Li
Kaipeng Zhang
Peng Gao
Yu Qiao
Ping Luo
MQ
26
178
0
25 Aug 2023
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only
  Quantization for LLMs
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs
Young Jin Kim
Rawn Henry
Raffy Fahim
Hany Awadalla
MQ
37
19
0
16 Aug 2023
A Survey on Model Compression for Large Language Models
A Survey on Model Compression for Large Language Models
Xunyu Zhu
Jian Li
Yong Liu
Can Ma
Weiping Wang
36
193
0
15 Aug 2023
Token-Scaled Logit Distillation for Ternary Weight Generative Language
  Models
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Minsoo Kim
Sihwa Lee
Jangwhan Lee
S. Hong
Duhyeuk Chang
Wonyong Sung
Jungwook Choi
MQ
24
14
0
13 Aug 2023
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
Jerry Chee
Yaohui Cai
Volodymyr Kuleshov
Chris De Sa
MQ
36
189
0
25 Jul 2023
A Model for Every User and Budget: Label-Free and Personalized
  Mixed-Precision Quantization
A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization
Edward Fish
Umberto Michieli
Mete Ozay
MQ
30
4
0
24 Jul 2023
ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization
  Using Floating-Point Formats
ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats
Xiaoxia Wu
Z. Yao
Yuxiong He
MQ
35
43
0
19 Jul 2023
Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation
Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation
Hao Peng
Qingqing Cao
Jesse Dodge
Matthew E. Peters
Jared Fernandez
...
Darrell Plessas
Iz Beltagy
Evan Pete Walsh
Noah A. Smith
Hannaneh Hajishirzi
32
7
0
19 Jul 2023
A Survey of Techniques for Optimizing Transformer Inference
A Survey of Techniques for Optimizing Transformer Inference
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
45
62
0
16 Jul 2023
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM
  Decoding
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding
Seongjun Yang
Gibbeum Lee
Jaewoong Cho
Dimitris Papailiopoulos
Kangwook Lee
23
33
0
12 Jul 2023
QIGen: Generating Efficient Kernels for Quantized Inference on Large
  Language Models
QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models
Tommaso Pegolotti
Elias Frantar
Dan Alistarh
Markus Püschel
MQ
24
3
0
07 Jul 2023
Transformers in Healthcare: A Survey
Transformers in Healthcare: A Survey
Subhash Nerella
S. Bandyopadhyay
Jiaqing Zhang
Miguel Contreras
Scott Siegel
...
Jessica Sena
B. Shickel
A. Bihorac
Kia Khezeli
Parisa Rashidi
MedIm
AI4CE
21
25
0
30 Jun 2023
An Efficient Sparse Inference Software Accelerator for Transformer-based
  Language Models on CPUs
An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs
Haihao Shen
Hengyu Meng
Bo Dong
Zhe Wang
Ofir Zafrir
...
Hanwen Chang
Qun Gao
Zi. Wang
Guy Boudoukh
Moshe Wasserblat
MoE
33
4
0
28 Jun 2023
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large
  Language Models
H2_22​O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
Zhenyu (Allen) Zhang
Ying Sheng
Dinesh Manocha
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zhangyang Wang
Beidi Chen
VLM
52
255
0
24 Jun 2023
Quantizable Transformers: Removing Outliers by Helping Attention Heads
  Do Nothing
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
23
87
0
22 Jun 2023
A Simple and Effective Pruning Approach for Large Language Models
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
62
359
0
20 Jun 2023
SqueezeLLM: Dense-and-Sparse Quantization
SqueezeLLM: Dense-and-Sparse Quantization
Sehoon Kim
Coleman Hooper
A. Gholami
Zhen Dong
Xiuyu Li
Sheng Shen
Michael W. Mahoney
Kurt Keutzer
MQ
29
167
0
13 Jun 2023
Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision
  Post-Training Quantization
Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision Post-Training Quantization
Clemens J. S. Schaefer
Navid Lambert-Shirzad
Xiaofan Zhang
Chia-Wei Chou
T. Jablin
Jian Li
Elfie Guo
Caitlin Stanton
S. Joshi
Yu Emma Wang
MQ
30
2
0
08 Jun 2023
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight
  Compression
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
Tim Dettmers
Ruslan Svirschevski
Vage Egiazarian
Denis Kuznedelev
Elias Frantar
Saleh Ashkboos
Alexander Borzunov
Torsten Hoefler
Dan Alistarh
MQ
29
231
0
05 Jun 2023
Temporal Dynamic Quantization for Diffusion Models
Temporal Dynamic Quantization for Diffusion Models
Junhyuk So
Jungwon Lee
Daehyun Ahn
Hyungjun Kim
Eunhyeok Park
DiffM
MQ
23
60
0
04 Jun 2023
AWQ: Activation-aware Weight Quantization for LLM Compression and
  Acceleration
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Ji Lin
Jiaming Tang
Haotian Tang
Shang Yang
Wei-Ming Chen
Wei-Chen Wang
Guangxuan Xiao
Xingyu Dang
Chuang Gan
Song Han
EDL
MQ
36
474
0
01 Jun 2023
FlexRound: Learnable Rounding based on Element-wise Division for
  Post-Training Quantization
FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization
J. H. Lee
Jeonghoon Kim
S. Kwon
Dongsoo Lee
MQ
28
33
0
01 Jun 2023
Intriguing Properties of Quantization at Scale
Intriguing Properties of Quantization at Scale
Arash Ahmadian
Saurabh Dash
Hongyu Chen
Bharat Venkitesh
Stephen Gou
Phil Blunsom
Ahmet Üstün
Sara Hooker
MQ
51
38
0
30 May 2023
PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language
  Models
PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models
Zhuocheng Gong
Jiahao Liu
Qifan Wang
Yang Yang
Jingang Wang
Wei Wu
Yunsen Xian
Dongyan Zhao
Rui Yan
MQ
35
5
0
30 May 2023
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Zechun Liu
Barlas Oğuz
Changsheng Zhao
Ernie Chang
Pierre Stock
Yashar Mehdad
Yangyang Shi
Raghuraman Krishnamoorthi
Vikas Chandra
MQ
54
190
0
29 May 2023
Efficient Storage of Fine-Tuned Models via Low-Rank Approximation of
  Weight Residuals
Efficient Storage of Fine-Tuned Models via Low-Rank Approximation of Weight Residuals
Simo Ryu
S. Seo
Jaejun Yoo
31
5
0
28 May 2023
Scissorhands: Exploiting the Persistence of Importance Hypothesis for
  LLM KV Cache Compression at Test Time
Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time
Zichang Liu
Aditya Desai
Fangshuo Liao
Weitao Wang
Victor Xie
Zhaozhuo Xu
Anastasios Kyrillidis
Anshumali Shrivastava
28
202
0
26 May 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive
  Transformers
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
42
53
0
25 May 2023
QLoRA: Efficient Finetuning of Quantized LLMs
QLoRA: Efficient Finetuning of Quantized LLMs
Tim Dettmers
Artidoro Pagnoni
Ari Holtzman
Luke Zettlemoyer
ALM
52
2,342
0
23 May 2023
Memory-Efficient Fine-Tuning of Compressed Large Language Models via
  sub-4-bit Integer Quantization
Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
Jeonghoon Kim
J. H. Lee
Sungdong Kim
Joonsuk Park
Kang Min Yoo
S. Kwon
Dongsoo Lee
MQ
44
99
0
23 May 2023
Do All Languages Cost the Same? Tokenization in the Era of Commercial
  Language Models
Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models
Orevaoghene Ahia
Sachin Kumar
Hila Gonen
Jungo Kasai
David R. Mortensen
Noah A. Smith
Yulia Tsvetkov
51
82
0
23 May 2023
Integer or Floating Point? New Outlooks for Low-Bit Quantization on
  Large Language Models
Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models
Yijia Zhang
Lingran Zhao
Shijie Cao
Wenqiang Wang
Ting Cao
Fan Yang
Mao Yang
Shanghang Zhang
Ningyi Xu
MQ
29
17
0
21 May 2023
LLM-Pruner: On the Structural Pruning of Large Language Models
LLM-Pruner: On the Structural Pruning of Large Language Models
Xinyin Ma
Gongfan Fang
Xinchao Wang
30
366
0
19 May 2023
A Survey of Safety and Trustworthiness of Large Language Models through
  the Lens of Verification and Validation
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Xiaowei Huang
Wenjie Ruan
Wei Huang
Gao Jin
Yizhen Dong
...
Sihao Wu
Peipei Xu
Dengyu Wu
André Freitas
Mustafa A. Mustafa
ALM
45
83
0
19 May 2023
SpecInfer: Accelerating Generative Large Language Model Serving with
  Tree-based Speculative Inference and Verification
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Zeyu Wang
...
Chunan Shi
Zhuoming Chen
Daiyaan Arfeen
Reyna Abhyankar
Zhihao Jia
LRM
65
120
0
16 May 2023
Stable and low-precision training for large-scale vision-language models
Stable and low-precision training for large-scale vision-language models
Mitchell Wortsman
Tim Dettmers
Luke Zettlemoyer
Ari S. Morcos
Ali Farhadi
Ludwig Schmidt
MQ
MLLM
VLM
24
39
0
25 Apr 2023
Outlier Suppression+: Accurate quantization of large language models by
  equivalent and optimal shifting and scaling
Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
Xiuying Wei
Yunchen Zhang
Yuhang Li
Xiangguo Zhang
Ruihao Gong
Jian Ren
Zhengang Li
MQ
27
31
0
18 Apr 2023
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca
Yiming Cui
Ziqing Yang
Xin Yao
ALM
26
297
0
17 Apr 2023
RPTQ: Reorder-based Post-training Quantization for Large Language Models
RPTQ: Reorder-based Post-training Quantization for Large Language Models
Zhihang Yuan
Lin Niu
Jia-Wen Liu
Wenyu Liu
Xinggang Wang
Yuzhang Shang
Guangyu Sun
Qiang Wu
Jiaxiang Wu
Bingzhe Wu
MQ
35
79
0
03 Apr 2023
FP8 versus INT8 for efficient deep learning inference
FP8 versus INT8 for efficient deep learning inference
M. V. Baalen
Andrey Kuzmin
Suparna S. Nair
Yuwei Ren
E. Mahurin
...
Sundar Subramanian
Sanghyuk Lee
Markus Nagel
Joseph B. Soriaga
Tijmen Blankevoort
MQ
28
45
0
31 Mar 2023
Unit Scaling: Out-of-the-Box Low-Precision Training
Unit Scaling: Out-of-the-Box Low-Precision Training
Charlie Blake
Douglas Orr
Carlo Luschi
MQ
24
7
0
20 Mar 2023
Previous
1234567
Next