Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.01089
Cited By
v1
v2
v3
v4 (latest)
RPTQ: Reorder-based Post-training Quantization for Large Language Models
3 April 2023
Zhihang Yuan
Lin Niu
Jia-Wen Liu
Wenyu Liu
Xinggang Wang
Yuzhang Shang
Guangyu Sun
Qiang Wu
Jiaxiang Wu
Bingzhe Wu
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"RPTQ: Reorder-based Post-training Quantization for Large Language Models"
23 / 23 papers shown
Title
NQKV: A KV Cache Quantization Scheme Based on Normal Distribution Characteristics
Zhihang Cai
Xingjun Zhang
Zhendong Tan
Zheng Wei
MQ
183
0
0
22 May 2025
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Hung-Yueh Chiang
Chi-chih Chang
N. Frumkin
Kai-Chiang Wu
Mohamed S. Abdelfattah
Diana Marculescu
MQ
439
0
0
28 Mar 2025
MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness
Zihao Zheng
Xiuping Cui
Size Zheng
Maoliang Li
Jiayu Chen
Yun Liang
Xiang Chen
MQ
MoE
106
0
0
27 Mar 2025
Q-VLM: Post-training Quantization for Large Vision-Language Models
Changyuan Wang
Ziwei Wang
Xiuwei Xu
Yansong Tang
Jie Zhou
Jiwen Lu
MQ
77
6
0
10 Oct 2024
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
Xingrun Xing
Boyan Gao
Zheng Zhang
David A. Clifton
Shitao Xiao
Li Du
Guoqi Li
Jiajun Zhang
103
5
0
05 Jul 2024
LCQ: Low-Rank Codebook based Quantization for Large Language Models
Wen-Pu Cai
Wu-Jun Li
Wu-Jun Li
MQ
92
0
0
31 May 2024
CBQ: Cross-Block Quantization for Large Language Models
Xin Ding
Xiaoyu Liu
Zhijun Tu
Yun-feng Zhang
Wei Li
...
Hanting Chen
Yehui Tang
Zhiwei Xiong
Baoqun Yin
Yunhe Wang
MQ
83
17
0
13 Dec 2023
Large Language Models with Controllable Working Memory
Daliang Li
A. S. Rawat
Manzil Zaheer
Xin Wang
Michal Lukasik
Andreas Veit
Felix X. Yu
Surinder Kumar
KELM
118
169
0
09 Nov 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
392
2,388
0
09 Nov 2022
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
349
1,094
0
05 Oct 2022
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
Cong Guo
Chen Zhang
Jingwen Leng
Zihan Liu
Fan Yang
Yun-Bo Liu
Minyi Guo
Yuhao Zhu
MQ
48
60
0
30 Aug 2022
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers
M. Lewis
Younes Belkada
Luke Zettlemoyer
MQ
98
653
0
15 Aug 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLM
MQ
122
479
0
04 Jun 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
...
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
176
835
0
14 Apr 2022
BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction
Yuhang Li
Ruihao Gong
Xu Tan
Yang Yang
Peng Hu
Qi Zhang
F. Yu
Wei Wang
Shi Gu
MQ
133
438
0
10 Feb 2021
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference
Steve Dai
Rangharajan Venkatesan
Haoxing Ren
B. Zimmer
W. Dally
Brucek Khailany
MQ
84
71
0
08 Feb 2021
Up or Down? Adaptive Rounding for Post-Training Quantization
Markus Nagel
Rana Ali Amjad
M. V. Baalen
Christos Louizos
Tijmen Blankevoort
MQ
88
585
0
22 Apr 2020
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Christopher Clark
Kenton Lee
Ming-Wei Chang
Tom Kwiatkowski
Michael Collins
Kristina Toutanova
227
1,527
0
24 May 2019
Learned Step Size Quantization
S. K. Esser
J. McKinstry
Deepika Bablani
R. Appuswamy
D. Modha
MQ
75
806
0
21 Feb 2019
Improving Neural Network Quantization without Retraining using Outlier Channel Splitting
Ritchie Zhao
Yuwei Hu
Jordan Dotzel
Christopher De Sa
Zhiru Zhang
OODD
MQ
97
311
0
28 Jan 2019
PACT: Parameterized Clipping Activation for Quantized Neural Networks
Jungwook Choi
Zhuo Wang
Swagath Venkataramani
P. Chuang
Vijayalakshmi Srinivasan
K. Gopalakrishnan
MQ
65
954
0
16 May 2018
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
328
2,876
0
26 Sep 2016
Binarized Neural Networks
Itay Hubara
Daniel Soudry
Ran El-Yaniv
MQ
195
1,347
0
08 Feb 2016
1