Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.05686
Cited By
And the Bit Goes Down: Revisiting the Quantization of Neural Networks
12 July 2019
Pierre Stock
Armand Joulin
Rémi Gribonval
Benjamin Graham
Hervé Jégou
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"And the Bit Goes Down: Revisiting the Quantization of Neural Networks"
33 / 33 papers shown
Title
An Empirical Investigation of Matrix Factorization Methods for Pre-trained Transformers
Ashim Gupta
Sina Mahdipour Saravani
P. Sadayappan
Vivek Srikumar
35
2
0
17 Jun 2024
GPTVQ: The Blessing of Dimensionality for LLM Quantization
M. V. Baalen
Andrey Kuzmin
Markus Nagel
Peter Couperus
Cédric Bastoul
E. Mahurin
Tijmen Blankevoort
Paul N. Whatmough
MQ
34
28
0
23 Feb 2024
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models
Minsik Cho
Keivan Alizadeh Vahid
Qichen Fu
Saurabh N. Adya
C. C. D. Mundo
Mohammad Rastegari
Devang Naik
Peter Zatloukal
MQ
23
6
0
02 Sep 2023
Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models
James OÑeill
Sourav Dutta
VLM
MQ
40
1
0
12 Jul 2023
PQA: Exploring the Potential of Product Quantization in DNN Hardware Acceleration
Ahmed F. AbouElhamayed
Angela Cui
Javier Fernandez-Marques
Nicholas D. Lane
Mohamed S. Abdelfattah
MQ
23
4
0
25 May 2023
NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling
Shishira R. Maiya
Sharath Girish
Max Ehrlich
Hanyu Wang
Kwot Sin Lee
Patrick Poirson
Pengxiang Wu
Chen Wang
Abhinav Shrivastava
VGen
47
40
0
30 Dec 2022
Hyperspherical Quantization: Toward Smaller and More Accurate Models
Dan Liu
X. Chen
Chen-li Ma
Xue Liu
MQ
30
3
0
24 Dec 2022
Weight Fixing Networks
Christopher Subia-Waud
S. Dasmahapatra
MQ
14
2
0
24 Oct 2022
Deep learning model compression using network sensitivity and gradients
M. Sakthi
N. Yadla
Raj Pawate
21
2
0
11 Oct 2022
Look-ups are not (yet) all you need for deep learning inference
Calvin McCarter
Nicholas Dronen
21
4
0
12 Jul 2022
LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification
Sharath Girish
Kamal Gupta
Saurabh Singh
Abhinav Shrivastava
36
11
0
06 Apr 2022
Neural Network Quantization with AI Model Efficiency Toolkit (AIMET)
S. Siddegowda
Marios Fournarakis
Markus Nagel
Tijmen Blankevoort
Chirag I. Patel
Abhijit Khobare
MQ
12
31
0
20 Jan 2022
Implicit Neural Video Compression
Yunfan Zhang
T. V. Rozendaal
Johann Brehmer
Markus Nagel
Taco S. Cohen
49
57
0
21 Dec 2021
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
Rui Han
Qinglong Zhang
C. Liu
Guoren Wang
Jian Tang
L. Chen
21
43
0
18 Dec 2021
Toward Compact Parameter Representations for Architecture-Agnostic Neural Network Compression
Yuezhou Sun
Wenlong Zhao
Lijun Zhang
Xiao Liu
Hui Guan
Matei A. Zaharia
26
0
0
19 Nov 2021
LVAC: Learned Volumetric Attribute Compression for Point Clouds using Coordinate Based Networks
Berivan Isik
P. Chou
S. Hwang
Nick Johnston
G. Toderici
3DPC
31
28
0
17 Nov 2021
RED++ : Data-Free Pruning of Deep Neural Networks via Input Splitting and Output Merging
Edouard Yvinec
Arnaud Dapogny
Matthieu Cord
Kévin Bailly
28
15
0
30 Sep 2021
A New Clustering-Based Technique for the Acceleration of Deep Convolutional Networks
Erion-Vasilis M. Pikoulis
C. Mavrokefalidis
Aris S. Lalos
24
10
0
19 Jul 2021
A White Paper on Neural Network Quantization
Markus Nagel
Marios Fournarakis
Rana Ali Amjad
Yelysei Bondarenko
M. V. Baalen
Tijmen Blankevoort
MQ
24
504
0
15 Jun 2021
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
40
815
0
14 Jun 2021
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
Jianfei Chen
Lianmin Zheng
Z. Yao
Dequan Wang
Ion Stoica
Michael W. Mahoney
Joseph E. Gonzalez
MQ
27
74
0
29 Apr 2021
Differentiable Model Compression via Pseudo Quantization Noise
Alexandre Défossez
Yossi Adi
Gabriel Synnaeve
DiffM
MQ
18
47
0
20 Apr 2021
Knowledge Distillation as Semiparametric Inference
Tri Dao
G. Kamath
Vasilis Syrgkanis
Lester W. Mackey
37
31
0
20 Apr 2021
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference
Steve Dai
Rangharajan Venkatesan
Haoxing Ren
B. Zimmer
W. Dally
Brucek Khailany
MQ
25
67
0
08 Feb 2021
Fixed-point Quantization of Convolutional Neural Networks for Quantized Inference on Embedded Platforms
Rishabh Goyal
Joaquin Vanschoren
V. V. Acht
S. Nijssen
MQ
22
23
0
03 Feb 2021
Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks
Julieta Martinez
Jashan Shewakramani
Ting Liu
Ioan Andrei Bârsan
Wenyuan Zeng
R. Urtasun
MQ
18
30
0
29 Oct 2020
TernaryBERT: Distillation-aware Ultra-low Bit BERT
Wei Zhang
Lu Hou
Yichun Yin
Lifeng Shang
Xiao Chen
Xin Jiang
Qun Liu
MQ
30
208
0
27 Sep 2020
Transform Quantization for CNN (Convolutional Neural Network) Compression
Sean I. Young
Wang Zhe
David S. Taubman
B. Girod
MQ
29
69
0
02 Sep 2020
SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud
Stefanos Laskaridis
Stylianos I. Venieris
Mario Almeida
Ilias Leontiadis
Nicholas D. Lane
28
265
0
14 Aug 2020
FrostNet: Towards Quantization-Aware Network Architecture Search
Taehoon Kim
Y. Yoo
Jihoon Yang
MQ
22
2
0
17 Jun 2020
An Overview of Neural Network Compression
James OÑeill
AI4CE
45
98
0
05 Jun 2020
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs
Yongkweon Jeon
Baeseong Park
S. Kwon
Byeongwook Kim
Jeongin Yun
Dongsoo Lee
MQ
27
30
0
20 May 2020
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
Aojun Zhou
Anbang Yao
Yiwen Guo
Lin Xu
Yurong Chen
MQ
337
1,049
0
10 Feb 2017
1