Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.17951
Cited By
FP8 versus INT8 for efficient deep learning inference
31 March 2023
M. V. Baalen
Andrey Kuzmin
Suparna S. Nair
Yuwei Ren
E. Mahurin
Chirag I. Patel
Sundar Subramanian
Sanghyuk Lee
Markus Nagel
Joseph B. Soriaga
Tijmen Blankevoort
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FP8 versus INT8 for efficient deep learning inference"
28 / 28 papers shown
Title
Resource-Efficient Language Models: Quantization for Fast and Accessible Inference
Tollef Emil Jørgensen
MQ
54
0
0
13 May 2025
Improving Quantization with Post-Training Model Expansion
Giuseppe Franco
Pablo Monteagudo-Lago
Ian Colbert
Nicholas J. Fraser
Michaela Blott
MQ
57
2
0
21 Mar 2025
FP4DiT: Towards Effective Floating Point Quantization for Diffusion Transformers
Ruichen Chen
Keith G. Mills
Di Niu
MQ
56
0
0
19 Mar 2025
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
Sifan Zhou
Shuo Wang
Zhihang Yuan
Mingjia Shi
Yuzhang Shang
Dawei Yang
ALM
MQ
90
0
0
18 Feb 2025
INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
Shimao Chen
Zirui Liu
Zhiying Wu
Ce Zheng
Peizhuang Cong
Zihan Jiang
Yuhan Wu
Lei Su
Tong Yang
MQ
VLM
47
3
0
25 Sep 2024
Small Language Models: Survey, Measurements, and Insights
Zhenyan Lu
Xiang Li
Dongqi Cai
Rongjie Yi
Fangming Liu
Xiwen Zhang
Nicholas D. Lane
Mengwei Xu
ObjD
LRM
58
36
0
24 Sep 2024
Adaptive Resolution Inference (ARI): Energy-Efficient Machine Learning for Internet of Things
Ziheng Wang
Pedro Reviriego
Farzad Niknia
Javier Conde
Shanshan Liu
Fabrizio Lombardi
MQ
27
2
0
26 Aug 2024
Scalify: scale propagation for efficient low-precision LLM training
Paul Balança
Sam Hosegood
Carlo Luschi
Andrew Fitzgibbon
26
2
0
24 Jul 2024
Procrastination Is All You Need: Exponent Indexed Accumulators for Floating Point, Posits and Logarithmic Numbers
Vincenzo Liguori
23
0
0
09 Jun 2024
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Burcu Harma
Ayan Chakraborty
Elizaveta Kostenok
Danila Mishin
Dongho Ha
...
Martin Jaggi
Ming Liu
Yunho Oh
Suvinay Subramanian
Amir Yazdanbakhsh
MQ
44
5
0
31 May 2024
PlanNetX: Learning an Efficient Neural Network Planner from MPC for Longitudinal Control
Jasper Hoffmann
Diego Fernandez Clausen
Julien Brosseit
Julian Bernhard
Klemens Esterle
M. Werling
Michael Karg
Joschka Boedecker
39
1
0
29 Apr 2024
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Aniruddha Nrusimha
Mayank Mishra
Naigang Wang
Dan Alistarh
Rameswar Panda
Yoon Kim
MQ
68
8
0
04 Apr 2024
Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators
Yaniv Blumenfeld
Itay Hubara
Daniel Soudry
42
3
0
25 Jan 2024
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Xiaoxia Wu
Haojun Xia
Stephen Youn
Zhen Zheng
Shiyang Chen
...
Reza Yazdani Aminabadi
Yuxiong He
Olatunji Ruwase
Leon Song
Zhewei Yao
71
8
0
14 Dec 2023
Look-Up mAI GeMM: Increasing AI GeMMs Performance by Nearly 2.5x via msGeMM
Saeed Maleki
VLM
15
4
0
09 Oct 2023
Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
Luoming Zhang
Wen Fei
Weijia Wu
Yefei He
Zhenyu Lou
Hong Zhou
MQ
22
5
0
07 Oct 2023
Hadamard Domain Training with Integers for Class Incremental Quantized Learning
Martin Schiemer
Clemens J. S. Schaefer
Jayden Parker Vap
Mark Horeni
Yu Emma Wang
Juan Ye
Siddharth Joshi
36
2
0
05 Oct 2023
Training and inference of large language models using 8-bit floating point
Sergio P. Perez
Yan Zhang
James Briggs
Charlie Blake
Prashanth Krishnamurthy
Paul Balanca
Carlo Luschi
Stephen Barlow
Andrew William Fitzgibbon
MQ
27
18
0
29 Sep 2023
ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats
Xiaoxia Wu
Z. Yao
Yuxiong He
MQ
35
43
0
19 Jul 2023
A Survey of Techniques for Optimizing Transformer Inference
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
43
62
0
16 Jul 2023
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
18
88
0
22 Jun 2023
Accuracy Booster: Enabling 4-bit Fixed-point Arithmetic for DNN Training
Simla Burcu Harma
Canberk Sonmez
Nicholas Sperry
Babak Falsafi
Martin Jaggi
Yunho Oh
MQ
39
4
0
19 Nov 2022
FP8 Formats for Deep Learning
Paulius Micikevicius
Dusan Stosic
N. Burgess
Marius Cornea
Pradeep Dubey
...
Naveen Mellempudi
S. Oberman
M. Shoeybi
Michael Siu
Hao Wu
BDL
VLM
MQ
74
122
0
12 Sep 2022
Overcoming Oscillations in Quantization-Aware Training
Markus Nagel
Marios Fournarakis
Yelysei Bondarenko
Tijmen Blankevoort
MQ
111
101
0
21 Mar 2022
Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit Integers
Yukuan Yang
Shuang Wu
Lei Deng
Tianyi Yan
Yuan Xie
Guoqi Li
MQ
99
110
0
05 Sep 2019
Deep High-Resolution Representation Learning for Visual Recognition
Jingdong Wang
Ke Sun
Tianheng Cheng
Borui Jiang
Chaorui Deng
...
Yadong Mu
Mingkui Tan
Xinggang Wang
Wenyu Liu
Bin Xiao
195
3,531
0
20 Aug 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
296
39,198
0
01 Sep 2014
1