ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.05079
  4. Cited By
Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM
  Inference?
v1v2 (latest)

Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?

8 October 2023
Cheng Zhang
Jianyi Cheng
Ilia Shumailov
George A. Constantinides
Yiren Zhao
    MQ
ArXiv (abs)PDFHTML

Papers citing "Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?"

13 / 13 papers shown
Title
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers
M. Lewis
Younes Belkada
Luke Zettlemoyer
MQ
103
662
0
15 Aug 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for
  Large-Scale Transformers
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLMMQ
127
479
0
04 Jun 2022
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
288
2,521
0
20 Apr 2021
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision
  Neural Network Inference
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference
Steve Dai
Rangharajan Venkatesan
Haoxing Ren
B. Zimmer
W. Dally
Brucek Khailany
MQ
86
72
0
08 Feb 2021
BinaryBERT: Pushing the Limit of BERT Quantization
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
214
227
0
31 Dec 2020
HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs
HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs
H. Habi
Roy H. Jennings
Arnon Netzer
MQ
68
65
0
20 Jul 2020
Mixed Precision Quantization of ConvNets via Differentiable Neural
  Architecture Search
Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search
Bichen Wu
Yanghan Wang
Peizhao Zhang
Yuandong Tian
Peter Vajda
Kurt Keutzer
MQ
74
273
0
30 Nov 2018
HAQ: Hardware-Aware Automated Quantization with Mixed Precision
HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Kuan-Chieh Wang
Zhijian Liu
Chengyue Wu
Ji Lin
Song Han
MQ
129
884
0
21 Nov 2018
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep
  Neural Networks
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
Dongqing Zhang
Jiaolong Yang
Dongqiangzi Ye
G. Hua
MQ
63
703
0
26 Jul 2018
Quantizing deep convolutional networks for efficient inference: A
  whitepaper
Quantizing deep convolutional networks for efficient inference: A whitepaper
Raghuraman Krishnamoorthi
MQ
141
1,021
0
21 Jun 2018
Neural Network Acceptability Judgments
Neural Network Acceptability Judgments
Alex Warstadt
Amanpreet Singh
Samuel R. Bowman
242
1,413
0
31 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,196
0
20 Apr 2018
Pointer Sentinel Mixture Models
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
338
2,898
0
26 Sep 2016
1