ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05877
  4. Cited By
Quantization and Training of Neural Networks for Efficient
  Integer-Arithmetic-Only Inference

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

15 December 2017
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
    MQ
ArXivPDFHTML

Papers citing "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"

50 / 1,258 papers shown
Title
Lightweight Deep Learning for Resource-Constrained Environments: A
  Survey
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Hou-I Liu
Marco Galindo
Hongxia Xie
Lai-Kuan Wong
Hong-Han Shuai
Yung-Hui Li
Wen-Huang Cheng
58
48
0
08 Apr 2024
Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators
Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators
Jan Klhufek
Miroslav Safar
Vojtěch Mrázek
Z. Vašíček
Lukás Sekanina
MQ
32
1
0
08 Apr 2024
Mitigating the Impact of Outlier Channels for Language Model
  Quantization with Activation Regularization
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Aniruddha Nrusimha
Mayank Mishra
Naigang Wang
Dan Alistarh
Yikang Shen
Yoon Kim
MQ
68
8
0
04 Apr 2024
AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution
AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution
Chee Hong
Kyoung Mu Lee
SupR
MQ
26
2
0
04 Apr 2024
On the Surprising Efficacy of Distillation as an Alternative to
  Pre-Training Small Models
On the Surprising Efficacy of Distillation as an Alternative to Pre-Training Small Models
Sean Farhat
Deming Chen
42
0
0
04 Apr 2024
DNN Memory Footprint Reduction via Post-Training Intra-Layer
  Multi-Precision Quantization
DNN Memory Footprint Reduction via Post-Training Intra-Layer Multi-Precision Quantization
B. Ghavami
Amin Kamjoo
Lesley Shannon
S. Wilton
MQ
16
0
0
03 Apr 2024
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural
  Networks
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
Marina Neseem
Conor McCullough
Randy Hsin
Chas Leichner
Shan Li
...
Andrew G. Howard
Lukasz Lew
Sherief Reda
Ville Rautio
Daniele Moro
MQ
50
0
0
29 Mar 2024
Genetic Quantization-Aware Approximation for Non-Linear Operations in
  Transformers
Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers
Pingcheng Dong
Yonghao Tan
Dong Zhang
Tianwei Ni
Xuejiao Liu
...
Xijie Huang
Huaiyu Zhu
Yun Pan
Fengwei An
Kwang-Ting Cheng
MQ
36
3
0
28 Mar 2024
QNCD: Quantization Noise Correction for Diffusion Models
QNCD: Quantization Noise Correction for Diffusion Models
Huanpeng Chu
Wei Wu
Chengjie Zang
Kun Yuan
DiffM
MQ
41
4
0
28 Mar 2024
Tiny Machine Learning: Progress and Futures
Tiny Machine Learning: Progress and Futures
Ji Lin
Ligeng Zhu
Wei-Ming Chen
Wei-Chen Wang
Song Han
52
51
0
28 Mar 2024
Oh! We Freeze: Improving Quantized Knowledge Distillation via Signal
  Propagation Analysis for Large Language Models
Oh! We Freeze: Improving Quantized Knowledge Distillation via Signal Propagation Analysis for Large Language Models
Kartikeya Bhardwaj
N. Pandey
Sweta Priyadarshi
Kyunggeun Lee
Jun Ma
Harris Teague
MQ
42
2
0
26 Mar 2024
Are Compressed Language Models Less Subgroup Robust?
Are Compressed Language Models Less Subgroup Robust?
Leonidas Gee
Andrea Zugarini
Novi Quadrianto
33
1
0
26 Mar 2024
Systematic construction of continuous-time neural networks for linear
  dynamical systems
Systematic construction of continuous-time neural networks for linear dynamical systems
Chinmay Datar
Adwait Datar
Felix Dietrich
W. Schilders
AI4TS
44
1
0
24 Mar 2024
Fine Tuning LLM for Enterprise: Practical Guidelines and Recommendations
Fine Tuning LLM for Enterprise: Practical Guidelines and Recommendations
J. MathavRaj
VM Kushala
Harikrishna Warrier
Yogesh Gupta
27
32
0
23 Mar 2024
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data
  Flow and Per-Block Quantization
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization
Haocheng Xi
Yuxiang Chen
Kang Zhao
Kaijun Zheng
Jianfei Chen
Jun Zhu
MQ
45
20
0
19 Mar 2024
Adversarial Fine-tuning of Compressed Neural Networks for Joint
  Improvement of Robustness and Efficiency
Adversarial Fine-tuning of Compressed Neural Networks for Joint Improvement of Robustness and Efficiency
Hallgrimur Thorsteinsson
Valdemar J Henriksen
Tong Chen
Raghavendra Selvan
AAML
48
1
0
14 Mar 2024
CoroNetGAN: Controlled Pruning of GANs via Hypernetworks
CoroNetGAN: Controlled Pruning of GANs via Hypernetworks
Aman Kumar
Khushboo Anand
Shubham Mandloi
Ashutosh Mishra
Avinash Thakur
Neeraj Kasera
Prathosh A P
30
3
0
13 Mar 2024
LookupFFN: Making Transformers Compute-lite for CPU inference
LookupFFN: Making Transformers Compute-lite for CPU inference
Zhanpeng Zeng
Michael Davies
Pranav Pulijala
Karthikeyan Sankaralingam
Vikas Singh
38
5
0
12 Mar 2024
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven
  Fine Tuning
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning
Jiun-Man Chen
Yu-Hsuan Chao
Yu-Jie Wang
Ming-Der Shieh
Chih-Chung Hsu
Wei-Fen Lin
MQ
40
1
0
11 Mar 2024
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless
  Generative Inference of LLM
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Hao Kang
Qingru Zhang
Souvik Kundu
Geonhwa Jeong
Zaoxing Liu
Tushar Krishna
Tuo Zhao
MQ
43
81
0
08 Mar 2024
The Impact of Quantization on the Robustness of Transformer-based Text
  Classifiers
The Impact of Quantization on the Robustness of Transformer-based Text Classifiers
Seyed Parsa Neshaei
Yasaman Boreshban
Gholamreza Ghassem-Sani
Seyed Abolghasem Mirroshandel
MQ
41
0
0
08 Mar 2024
Self-Adapting Large Visual-Language Models to Edge Devices across Visual
  Modalities
Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities
Kaiwen Cai
Zhekai Duan
Gaowen Liu
Charles Fleming
Chris Xiaoxuan Lu
VLM
38
4
0
07 Mar 2024
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
Hanlin Tang
Yifu Sun
Decheng Wu
Kai Liu
Jianchen Zhu
Zhanhui Kang
MQ
28
10
0
05 Mar 2024
Better Schedules for Low Precision Training of Deep Neural Networks
Better Schedules for Low Precision Training of Deep Neural Networks
Cameron R. Wolfe
Anastasios Kyrillidis
47
1
0
04 Mar 2024
FlowPrecision: Advancing FPGA-Based Real-Time Fluid Flow Estimation with
  Linear Quantization
FlowPrecision: Advancing FPGA-Based Real-Time Fluid Flow Estimation with Linear Quantization
Tianheng Ling
Julian Hoever
Chao Qian
Gregor Schiele
MQ
18
5
0
04 Mar 2024
BasedAI: A decentralized P2P network for Zero Knowledge Large Language
  Models (ZK-LLMs)
BasedAI: A decentralized P2P network for Zero Knowledge Large Language Models (ZK-LLMs)
Sean Wellington
28
5
0
01 Mar 2024
Resilience of Entropy Model in Distributed Neural Networks
Resilience of Entropy Model in Distributed Neural Networks
Milin Zhang
Mohammad Abdi
Shahriar Rifat
Francesco Restuccia
AAML
38
0
0
01 Mar 2024
Large Language Models and Games: A Survey and Roadmap
Large Language Models and Games: A Survey and Roadmap
Roberto Gallotta
Graham Todd
Marvin Zammit
Sam Earle
Antonios Liapis
Julian Togelius
Georgios N. Yannakakis
LLMAG
LM&MA
AI4CE
LRM
50
73
0
28 Feb 2024
FlattenQuant: Breaking Through the Inference Compute-bound for Large
  Language Models with Per-tensor Quantization
FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization
Yi Zhang
Fei Yang
Shuang Peng
Fangyu Wang
Aimin Pan
MQ
36
2
0
28 Feb 2024
Understanding Neural Network Binarization with Forward and Backward
  Proximal Quantizers
Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers
Yiwei Lu
Yaoliang Yu
Xinlin Li
Vahid Partovi Nia
MQ
38
3
0
27 Feb 2024
Adaptive quantization with mixed-precision based on low-cost proxy
Adaptive quantization with mixed-precision based on low-cost proxy
Jing Chen
Qiao Yang
Senmao Tian
Shunli Zhang
MQ
28
1
0
27 Feb 2024
A Comprehensive Evaluation of Quantization Strategies for Large Language
  Models
A Comprehensive Evaluation of Quantization Strategies for Large Language Models
Renren Jin
Jiangcun Du
Wuwei Huang
Wei Liu
Jian Luan
Bin Wang
Deyi Xiong
MQ
32
31
0
26 Feb 2024
GenAINet: Enabling Wireless Collective Intelligence via Knowledge Transfer and Reasoning
GenAINet: Enabling Wireless Collective Intelligence via Knowledge Transfer and Reasoning
Han Zou
Qiyang Zhao
Lina Bariah
Yu Tian
M. Bennis
S. Lasaulce
101
12
0
26 Feb 2024
Towards Accurate Post-training Quantization for Reparameterized Models
Towards Accurate Post-training Quantization for Reparameterized Models
Luoming Zhang
Yefei He
Wen Fei
Zhenyu Lou
Weijia Wu
YangWei Ying
Hong Zhou
MQ
43
0
0
25 Feb 2024
Fine-Grained Self-Endorsement Improves Factuality and Reasoning
Fine-Grained Self-Endorsement Improves Factuality and Reasoning
Ante Wang
Linfeng Song
Baolin Peng
Ye Tian
Lifeng Jin
Haitao Mi
Jinsong Su
Dong Yu
HILM
LRM
23
6
0
23 Feb 2024
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity
  within Large Language Models
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
Chenyang Song
Xu Han
Zhengyan Zhang
Shengding Hu
Xiyu Shi
...
Chen Chen
Zhiyuan Liu
Guanglin Li
Tao Yang
Maosong Sun
53
24
0
21 Feb 2024
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding
Zhuoming Chen
Avner May
Ruslan Svirschevski
Yuhsun Huang
Max Ryabinin
Zhihao Jia
Beidi Chen
42
39
0
19 Feb 2024
Is It a Free Lunch for Removing Outliers during Pretraining?
Is It a Free Lunch for Removing Outliers during Pretraining?
Baohao Liao
Christof Monz
MQ
34
1
0
19 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
41
48
0
15 Feb 2024
Graph Inference Acceleration by Learning MLPs on Graphs without
  Supervision
Graph Inference Acceleration by Learning MLPs on Graphs without Supervision
Zehong Wang
Zheyuan Zhang
Chuxu Zhang
Yanfang Ye
29
0
0
14 Feb 2024
Towards Meta-Pruning via Optimal Transport
Towards Meta-Pruning via Optimal Transport
Alexander Theus
Olin Geimer
Friedrich Wicke
Thomas Hofmann
Sotiris Anagnostidis
Sidak Pal Singh
MoMe
24
3
0
12 Feb 2024
Successive Refinement in Large-Scale Computation: Advancing Model
  Inference Applications
Successive Refinement in Large-Scale Computation: Advancing Model Inference Applications
H. Esfahanizadeh
Alejandro Cohen
S. Shamai
Muriel Médard
34
1
0
11 Feb 2024
RepQuant: Towards Accurate Post-Training Quantization of Large
  Transformer Models via Scale Reparameterization
RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization
Zhikai Li
Xuewen Liu
Jing Zhang
Qingyi Gu
MQ
45
7
0
08 Feb 2024
ApiQ: Finetuning of 2-Bit Quantized Large Language Model
ApiQ: Finetuning of 2-Bit Quantized Large Language Model
Baohao Liao
Christian Herold
Shahram Khadivi
Christof Monz
CLL
MQ
47
12
0
07 Feb 2024
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Wei Huang
Yangdong Liu
Haotong Qin
Ying Li
Shiming Zhang
Xianglong Liu
Michele Magno
Xiaojuan Qi
MQ
82
69
0
06 Feb 2024
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective
  Finetuning
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
Haoxuan Wang
Yuzhang Shang
Zhihang Yuan
Junyi Wu
Yan Yan
DiffM
MQ
11
28
0
06 Feb 2024
Emergency Computing: An Adaptive Collaborative Inference Method Based on
  Hierarchical Reinforcement Learning
Emergency Computing: An Adaptive Collaborative Inference Method Based on Hierarchical Reinforcement Learning
Weiqi Fu
Lianming Xu
Xin Wu
Li Wang
Aiguo Fei
21
0
0
03 Feb 2024
HW-SW Optimization of DNNs for Privacy-preserving People Counting on
  Low-resolution Infrared Arrays
HW-SW Optimization of DNNs for Privacy-preserving People Counting on Low-resolution Infrared Arrays
Matteo Risso
Chen Xie
Francesco Daghero
Alessio Burrello
Seyedmorteza Mollaei
Marco Castellano
Enrico Macii
M. Poncino
Daniele Jahier Pagliari
25
0
0
02 Feb 2024
Effective Multi-Stage Training Model For Edge Computing Devices In
  Intrusion Detection
Effective Multi-Stage Training Model For Edge Computing Devices In Intrusion Detection
Thua Huynh Trong
Thanh Nguyen Hoang
21
3
0
31 Jan 2024
Super Efficient Neural Network for Compression Artifacts Reduction and
  Super Resolution
Super Efficient Neural Network for Compression Artifacts Reduction and Super Resolution
Wen Ma
Qiuwen Lou
Arman Kazemi
Julian Faraone
Tariq Afzal
SupR
36
0
0
26 Jan 2024
Previous
123...567...242526
Next