Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1812.08011
Cited By
Training Deep Neural Networks with 8-bit Floating Point Numbers
19 December 2018
Naigang Wang
Jungwook Choi
D. Brand
Chia-Yu Chen
K. Gopalakrishnan
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Training Deep Neural Networks with 8-bit Floating Point Numbers"
50 / 71 papers shown
Title
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
Kaipeng Zhang
Lizhuang Ma
Yufei Guo
Jun Wang
Weixi Zhang
MQ
57
0
0
01 May 2025
Stochastic Rounding for LLM Training: Theory and Practice
Kaan Ozkara
Tao Yu
Youngsuk Park
43
0
0
27 Feb 2025
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang
Haotian Hu
Zhenyu Zhang
Gaojie Jin
Xianrui Li
...
Tianlong Chen
Lu Liu
Qingsong Wen
Zhangyang Wang
Shiwei Liu
MQ
39
0
0
24 Feb 2025
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
Sifan Zhou
Shuo Wang
Zhihang Yuan
Mingjia Shi
Yuzhang Shang
Dawei Yang
ALM
MQ
90
0
0
18 Feb 2025
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Haocheng Xi
Han Cai
Ligeng Zhu
Yunfan LU
Kurt Keutzer
Jianfei Chen
Song Han
MQ
75
9
0
25 Oct 2024
S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training
Yuezhou Hu
Jun-Jie Zhu
Jianfei Chen
45
0
0
13 Sep 2024
u-
μ
\mu
μ
P: The Unit-Scaled Maximal Update Parametrization
Charlie Blake
C. Eichenberg
Josef Dean
Lukas Balles
Luke Y. Prince
Bjorn Deiseroth
Andres Felipe Cruz Salinas
Carlo Luschi
Samuel Weinbach
Douglas Orr
61
9
0
24 Jul 2024
Towards Federated Learning with On-device Training and Communication in 8-bit Floating Point
Bokun Wang
Axel Berg
D. A. E. Acar
Chuteng Zhou
FedML
MQ
48
0
0
02 Jul 2024
Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators
Yaniv Blumenfeld
Itay Hubara
Daniel Soudry
47
3
0
25 Jan 2024
Knowledge Translation: A New Pathway for Model Compression
Wujie Sun
Defang Chen
Jiawei Chen
Yan Feng
Chun-Yen Chen
Can Wang
31
0
0
11 Jan 2024
Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models
James OÑeill
Sourav Dutta
VLM
MQ
42
1
0
12 Jul 2023
Breaking On-device Training Memory Wall: A Systematic Survey
Shitian Li
Chunlin Tian
Kahou Tam
Ruirui Ma
Li Li
23
2
0
17 Jun 2023
Transformer-based models and hardware acceleration analysis in autonomous driving: A survey
J. Zhong
Zheng Liu
Xiangshan Chen
ViT
44
17
0
21 Apr 2023
FP8 versus INT8 for efficient deep learning inference
M. V. Baalen
Andrey Kuzmin
Suparna S. Nair
Yuwei Ren
E. Mahurin
...
Sundar Subramanian
Sanghyuk Lee
Markus Nagel
Joseph B. Soriaga
Tijmen Blankevoort
MQ
31
45
0
31 Mar 2023
Unit Scaling: Out-of-the-Box Low-Precision Training
Charlie Blake
Douglas Orr
Carlo Luschi
MQ
24
7
0
20 Mar 2023
With Shared Microexponents, A Little Shifting Goes a Long Way
Bita Darvish Rouhani
Ritchie Zhao
V. Elango
Rasoul Shafipour
Mathew Hall
...
Eric S. Chung
Zhaoxia Deng
S. Naghshineh
Jongsoo Park
Maxim Naumov
MQ
43
36
0
16 Feb 2023
PDPU: An Open-Source Posit Dot-Product Unit for Deep Learning Applications
Qiong Li
Chao Fang
Zhongfeng Wang
21
4
0
03 Feb 2023
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
31
47
0
02 Feb 2023
On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality
Lu Xia
M. Hochstenbach
Stefano Massei
27
2
0
23 Jan 2023
Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning
Huimin Wu
Chenyang Lei
Xiao Sun
Pengju Wang
Qifeng Chen
Kwang-Ting Cheng
Stephen Lin
Zhirong Wu
MQ
38
5
0
19 Dec 2022
Numerical Stability of DeepGOPlus Inference
Inés Gonzalez Pepe
Yohan Chatelain
Gregory Kiar
Tristan Glatard
BDL
24
2
0
13 Dec 2022
AskewSGD : An Annealed interval-constrained Optimisation method to train Quantized Neural Networks
Louis Leconte
S. Schechtman
Eric Moulines
29
4
0
07 Nov 2022
Emergent Quantized Communication
Boaz Carmeli
Ron Meir
Yonatan Belinkov
MQ
AI4CE
28
8
0
04 Nov 2022
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks
Benoit Steiner
Mostafa Elhoushi
Jacob Kahn
James Hegarty
31
8
0
24 Oct 2022
FP8 Formats for Deep Learning
Paulius Micikevicius
Dusan Stosic
N. Burgess
Marius Cornea
Pradeep Dubey
...
Naveen Mellempudi
S. Oberman
M. Shoeybi
Michael Siu
Hao Wu
BDL
VLM
MQ
77
122
0
12 Sep 2022
FP8 Quantization: The Power of the Exponent
Andrey Kuzmin
M. V. Baalen
Yuwei Ren
Markus Nagel
Jorn W. T. Peters
Tijmen Blankevoort
MQ
25
80
0
19 Aug 2022
Productivity meets Performance: Julia on A64FX
Mosè Giordano
Milan Klower
Valentin Churavy
16
11
0
26 Jul 2022
8-bit Numerical Formats for Deep Neural Networks
Badreddine Noune
Philip Jones
Daniel Justus
Dominic Masters
Carlo Luschi
MQ
23
33
0
06 Jun 2022
Neural Architecture Search using Property Guided Synthesis
Charles Jin
P. Phothilimthana
Sudip Roy
27
6
0
08 May 2022
FlexBlock: A Flexible DNN Training Accelerator with Multi-Mode Block Floating Point Support
Seock-Hwan Noh
Jahyun Koo
Seunghyun Lee
Jongse Park
Jaeha Kung
AI4CE
32
17
0
13 Mar 2022
Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats
Brian Chmiel
Ron Banner
Elad Hoffer
Hilla Ben Yaacov
Daniel Soudry
MQ
33
23
0
19 Dec 2021
LVAC: Learned Volumetric Attribute Compression for Point Clouds using Coordinate Based Networks
Berivan Isik
P. Chou
S. Hwang
Nick Johnston
G. Toderici
3DPC
34
28
0
17 Nov 2021
DAdaQuant: Doubly-adaptive quantization for communication-efficient Federated Learning
Robert Hönig
Yiren Zhao
Robert D. Mullins
FedML
109
54
0
31 Oct 2021
Exploring System Performance of Continual Learning for Mobile and Embedded Sensing Applications
Young D. Kwon
Jagmohan Chauhan
Abhishek Kumar
Pan Hui
Cecilia Mascolo
CLL
HAI
28
32
0
25 Oct 2021
8-bit Optimizers via Block-wise Quantization
Tim Dettmers
M. Lewis
Sam Shleifer
Luke Zettlemoyer
MQ
34
273
0
06 Oct 2021
Reducing numerical precision preserves classification accuracy in Mondrian Forests
Marc Vicuna
Martin Khannouz
Gregory Kiar
Yohan Chatelain
Tristan Glatard
MQ
11
3
0
28 Jun 2021
LNS-Madam: Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update
Jiawei Zhao
Steve Dai
Rangharajan Venkatesan
Brian Zimmer
Mustafa Ali
Xuan Li
Brucek Khailany
B. Dally
Anima Anandkumar
MQ
39
13
0
26 Jun 2021
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
Jianfei Chen
Lianmin Zheng
Z. Yao
Dequan Wang
Ion Stoica
Michael W. Mahoney
Joseph E. Gonzalez
MQ
27
74
0
29 Apr 2021
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
Chia-Yu Chen
Jiamin Ni
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
...
Naigang Wang
Swagath Venkataramani
Vijayalakshmi Srinivasan
Wei Zhang
K. Gopalakrishnan
29
66
0
21 Apr 2021
Charged particle tracking via edge-classifying interaction networks
G. Dezoort
S. Thais
Javier Mauricio Duarte
Vesal Razavimaleki
M. Atkinson
I. Ojalvo
Mark S. Neubauer
P. Elmer
25
46
0
30 Mar 2021
Distribution Adaptive INT8 Quantization for Training CNNs
Kang Zhao
Sida Huang
Pan Pan
Yinghan Li
Yingya Zhang
Zhenyu Gu
Yinghui Xu
MQ
24
62
0
09 Feb 2021
Enabling Binary Neural Network Training on the Edge
Erwei Wang
James J. Davis
Daniele Moro
Piotr Zielinski
Jia Jie Lim
C. Coelho
S. Chatterjee
P. Cheung
George A. Constantinides
MQ
20
24
0
08 Feb 2021
Pruning and Quantization for Deep Neural Network Acceleration: A Survey
Tailin Liang
C. Glossner
Lei Wang
Shaobo Shi
Xiaotong Zhang
MQ
150
675
0
24 Jan 2021
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
Y. Fu
Haoran You
Yang Katie Zhao
Yue Wang
Chaojian Li
K. Gopalakrishnan
Zhangyang Wang
Yingyan Lin
MQ
38
32
0
24 Dec 2020
A Statistical Framework for Low-bitwidth Training of Deep Neural Networks
Jianfei Chen
Yujie Gai
Z. Yao
Michael W. Mahoney
Joseph E. Gonzalez
MQ
17
58
0
27 Oct 2020
FPRaker: A Processing Element For Accelerating Neural Network Training
Omar Mohamed Awad
Mostafa Mahmoud
Isak Edo Vivancos
Ali Hadi Zadeh
Ciaran Bannon
Anand Jayarajan
Gennady Pekhimenko
Andreas Moshovos
25
15
0
15 Oct 2020
AQD: Towards Accurate Fully-Quantized Object Detection
Peng Chen
Jing Liu
Bohan Zhuang
Mingkui Tan
Chunhua Shen
MQ
29
10
0
14 Jul 2020
Multi-Precision Policy Enforced Training (MuPPET): A precision-switching strategy for quantised fixed-point training of CNNs
A. Rajagopal
D. A. Vink
Stylianos I. Venieris
C. Bouganis
MQ
16
14
0
16 Jun 2020
Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors
C. Coelho
Aki Kuusela
Shane Li
Zhuang Hao
T. Aarrestad
Vladimir Loncar
J. Ngadiuba
M. Pierini
Adrian Alan Pol
S. Summers
MQ
32
175
0
15 Jun 2020
An Overview of Neural Network Compression
James OÑeill
AI4CE
45
98
0
05 Jun 2020
1
2
Next