ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05877
  4. Cited By
Quantization and Training of Neural Networks for Efficient
  Integer-Arithmetic-Only Inference

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

15 December 2017
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
    MQ
ArXiv (abs)PDFHTML

Papers citing "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"

50 / 1,298 papers shown
Title
Genetic Quantization-Aware Approximation for Non-Linear Operations in
  Transformers
Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers
Pingcheng Dong
Yonghao Tan
Dong Zhang
Tianwei Ni
Xuejiao Liu
...
Xijie Huang
Huaiyu Zhu
Yun Pan
Fengwei An
Kwang-Ting Cheng
MQ
38
5
0
28 Mar 2024
QNCD: Quantization Noise Correction for Diffusion Models
QNCD: Quantization Noise Correction for Diffusion Models
Huanpeng Chu
Wei Wu
Chengjie Zang
Kun Yuan
DiffMMQ
84
6
0
28 Mar 2024
Tiny Machine Learning: Progress and Futures
Tiny Machine Learning: Progress and Futures
Ji Lin
Ligeng Zhu
Wei-Ming Chen
Wei-Chen Wang
Song Han
90
60
0
28 Mar 2024
Oh! We Freeze: Improving Quantized Knowledge Distillation via Signal
  Propagation Analysis for Large Language Models
Oh! We Freeze: Improving Quantized Knowledge Distillation via Signal Propagation Analysis for Large Language Models
Kartikeya Bhardwaj
N. Pandey
Sweta Priyadarshi
Kyunggeun Lee
Jun Ma
Harris Teague
MQ
62
2
0
26 Mar 2024
Are Compressed Language Models Less Subgroup Robust?
Are Compressed Language Models Less Subgroup Robust?
Leonidas Gee
Andrea Zugarini
Novi Quadrianto
71
1
0
26 Mar 2024
Systematic construction of continuous-time neural networks for linear
  dynamical systems
Systematic construction of continuous-time neural networks for linear dynamical systems
Chinmay Datar
Adwait Datar
Felix Dietrich
W. Schilders
AI4TS
58
1
0
24 Mar 2024
Fine Tuning LLM for Enterprise: Practical Guidelines and Recommendations
Fine Tuning LLM for Enterprise: Practical Guidelines and Recommendations
J. MathavRaj
VM Kushala
Harikrishna Warrier
Yogesh Gupta
71
32
0
23 Mar 2024
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data
  Flow and Per-Block Quantization
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization
Haocheng Xi
Yuxiang Chen
Kang Zhao
Kaijun Zheng
Jianfei Chen
Jun Zhu
MQ
99
23
0
19 Mar 2024
Adversarial Fine-tuning of Compressed Neural Networks for Joint
  Improvement of Robustness and Efficiency
Adversarial Fine-tuning of Compressed Neural Networks for Joint Improvement of Robustness and Efficiency
Hallgrimur Thorsteinsson
Valdemar J Henriksen
Tong Chen
Raghavendra Selvan
AAML
85
1
0
14 Mar 2024
CoroNetGAN: Controlled Pruning of GANs via Hypernetworks
CoroNetGAN: Controlled Pruning of GANs via Hypernetworks
Aman Kumar
Khushboo Anand
Shubham Mandloi
Ashutosh Mishra
Avinash Thakur
Neeraj Kasera
Prathosh A P
77
4
0
13 Mar 2024
LookupFFN: Making Transformers Compute-lite for CPU inference
LookupFFN: Making Transformers Compute-lite for CPU inference
Zhanpeng Zeng
Michael Davies
Pranav Pulijala
Karthikeyan Sankaralingam
Vikas Singh
71
6
0
12 Mar 2024
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven
  Fine Tuning
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning
Jiun-Man Chen
Yu-Hsuan Chao
Yu-Jie Wang
Ming-Der Shieh
Chih-Chung Hsu
Wei-Fen Lin
MQ
84
1
0
11 Mar 2024
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless
  Generative Inference of LLM
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Hao Kang
Qingru Zhang
Souvik Kundu
Geonhwa Jeong
Zaoxing Liu
Tushar Krishna
Tuo Zhao
MQ
175
94
0
08 Mar 2024
The Impact of Quantization on the Robustness of Transformer-based Text
  Classifiers
The Impact of Quantization on the Robustness of Transformer-based Text Classifiers
Seyed Parsa Neshaei
Yasaman Boreshban
Gholamreza Ghassem-Sani
Seyed Abolghasem Mirroshandel
MQ
58
0
0
08 Mar 2024
Self-Adapting Large Visual-Language Models to Edge Devices across Visual
  Modalities
Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities
Kaiwen Cai
Zhekai Duan
Gaowen Liu
Charles Fleming
Chris Xiaoxuan Lu
VLM
84
4
0
07 Mar 2024
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
Hanlin Tang
Yifu Sun
Decheng Wu
Kai Liu
Jianchen Zhu
Zhanhui Kang
MQ
45
11
0
05 Mar 2024
Better Schedules for Low Precision Training of Deep Neural Networks
Better Schedules for Low Precision Training of Deep Neural Networks
Cameron R. Wolfe
Anastasios Kyrillidis
74
1
0
04 Mar 2024
FlowPrecision: Advancing FPGA-Based Real-Time Fluid Flow Estimation with
  Linear Quantization
FlowPrecision: Advancing FPGA-Based Real-Time Fluid Flow Estimation with Linear Quantization
Tianheng Ling
Julian Hoever
Chao Qian
Gregor Schiele
MQ
50
5
0
04 Mar 2024
BasedAI: A decentralized P2P network for Zero Knowledge Large Language
  Models (ZK-LLMs)
BasedAI: A decentralized P2P network for Zero Knowledge Large Language Models (ZK-LLMs)
Sean Wellington
38
5
0
01 Mar 2024
Resilience of Entropy Model in Distributed Neural Networks
Resilience of Entropy Model in Distributed Neural Networks
Milin Zhang
Mohammad Abdi
Shahriar Rifat
Francesco Restuccia
AAML
83
0
0
01 Mar 2024
Large Language Models and Games: A Survey and Roadmap
Large Language Models and Games: A Survey and Roadmap
Roberto Gallotta
Graham Todd
Marvin Zammit
Sam Earle
Antonios Liapis
Julian Togelius
Georgios N. Yannakakis
LLMAGLM&MAAI4CELRM
131
86
0
28 Feb 2024
FlattenQuant: Breaking Through the Inference Compute-bound for Large
  Language Models with Per-tensor Quantization
FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization
Yi Zhang
Fei Yang
Shuang Peng
Fangyu Wang
Aimin Pan
MQ
76
2
0
28 Feb 2024
Understanding Neural Network Binarization with Forward and Backward
  Proximal Quantizers
Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers
Yiwei Lu
Yaoliang Yu
Xinlin Li
Vahid Partovi Nia
MQ
84
3
0
27 Feb 2024
Adaptive quantization with mixed-precision based on low-cost proxy
Adaptive quantization with mixed-precision based on low-cost proxy
Jing Chen
Qiao Yang
Senmao Tian
Shunli Zhang
MQ
63
2
0
27 Feb 2024
A Comprehensive Evaluation of Quantization Strategies for Large Language
  Models
A Comprehensive Evaluation of Quantization Strategies for Large Language Models
Renren Jin
Jiangcun Du
Wuwei Huang
Wei Liu
Jian Luan
Bin Wang
Deyi Xiong
MQ
109
37
0
26 Feb 2024
GenAINet: Enabling Wireless Collective Intelligence via Knowledge Transfer and Reasoning
GenAINet: Enabling Wireless Collective Intelligence via Knowledge Transfer and Reasoning
Han Zou
Qiyang Zhao
Lina Bariah
Yu Tian
M. Bennis
S. Lasaulce
158
14
0
26 Feb 2024
Towards Accurate Post-training Quantization for Reparameterized Models
Towards Accurate Post-training Quantization for Reparameterized Models
Luoming Zhang
Yefei He
Wen Fei
Zhenyu Lou
Weijia Wu
YangWei Ying
Hong Zhou
MQ
75
0
0
25 Feb 2024
Fine-Grained Self-Endorsement Improves Factuality and Reasoning
Fine-Grained Self-Endorsement Improves Factuality and Reasoning
Ante Wang
Linfeng Song
Baolin Peng
Ye Tian
Lifeng Jin
Haitao Mi
Jinsong Su
Dong Yu
HILMLRM
68
7
0
23 Feb 2024
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity
  within Large Language Models
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
Chenyang Song
Xu Han
Zhengyan Zhang
Shengding Hu
Xiyu Shi
...
Chen Chen
Zhiyuan Liu
Guanglin Li
Tao Yang
Maosong Sun
157
32
0
21 Feb 2024
Is It a Free Lunch for Removing Outliers during Pretraining?
Is It a Free Lunch for Removing Outliers during Pretraining?
Baohao Liao
Christof Monz
MQ
61
1
0
19 Feb 2024
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding
Zhuoming Chen
Avner May
Ruslan Svirschevski
Yuhsun Huang
Max Ryabinin
Zhihao Jia
Beidi Chen
106
52
0
19 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
116
58
0
15 Feb 2024
Graph Inference Acceleration by Learning MLPs on Graphs without
  Supervision
Graph Inference Acceleration by Learning MLPs on Graphs without Supervision
Zehong Wang
Zheyuan Zhang
Chuxu Zhang
Yanfang Ye
74
0
0
14 Feb 2024
Towards Meta-Pruning via Optimal Transport
Towards Meta-Pruning via Optimal Transport
Alexander Theus
Olin Geimer
Friedrich Wicke
Thomas Hofmann
Sotiris Anagnostidis
Sidak Pal Singh
MoMe
86
4
0
12 Feb 2024
Successive Refinement in Large-Scale Computation: Advancing Model
  Inference Applications
Successive Refinement in Large-Scale Computation: Advancing Model Inference Applications
H. Esfahanizadeh
Alejandro Cohen
S. Shamai
Muriel Médard
70
1
0
11 Feb 2024
RepQuant: Towards Accurate Post-Training Quantization of Large
  Transformer Models via Scale Reparameterization
RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization
Zhikai Li
Xuewen Liu
Jing Zhang
Qingyi Gu
MQ
101
7
0
08 Feb 2024
ApiQ: Finetuning of 2-Bit Quantized Large Language Model
ApiQ: Finetuning of 2-Bit Quantized Large Language Model
Baohao Liao
Christian Herold
Shahram Khadivi
Christof Monz
CLLMQ
132
15
0
07 Feb 2024
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Wei Huang
Yangdong Liu
Haotong Qin
Ying Li
Shiming Zhang
Xianglong Liu
Michele Magno
Xiaojuan Qi
MQ
141
85
0
06 Feb 2024
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
Haoxuan Wang
Yuzhang Shang
Zhihang Yuan
Junyi Wu
Junchi Yan
Yan Yan
MQDiffM
94
30
0
06 Feb 2024
Emergency Computing: An Adaptive Collaborative Inference Method Based on
  Hierarchical Reinforcement Learning
Emergency Computing: An Adaptive Collaborative Inference Method Based on Hierarchical Reinforcement Learning
Weiqi Fu
Lianming Xu
Xin Wu
Li Wang
Aiguo Fei
28
0
0
03 Feb 2024
HW-SW Optimization of DNNs for Privacy-preserving People Counting on
  Low-resolution Infrared Arrays
HW-SW Optimization of DNNs for Privacy-preserving People Counting on Low-resolution Infrared Arrays
Matteo Risso
Chen Xie
Francesco Daghero
Luca Bompani
Seyedmorteza Mollaei
Marco Castellano
Enrico Macii
Massimo Poncino
Daniele Jahier Pagliari
52
0
0
02 Feb 2024
Effective Multi-Stage Training Model For Edge Computing Devices In
  Intrusion Detection
Effective Multi-Stage Training Model For Edge Computing Devices In Intrusion Detection
Thua Huynh Trong
Thanh Nguyen Hoang
58
3
0
31 Jan 2024
Super Efficient Neural Network for Compression Artifacts Reduction and
  Super Resolution
Super Efficient Neural Network for Compression Artifacts Reduction and Super Resolution
Wen Ma
Qiuwen Lou
Arman Kazemi
Julian Faraone
Tariq Afzal
SupR
66
0
0
26 Jan 2024
Marabou 2.0: A Versatile Formal Analyzer of Neural Networks
Marabou 2.0: A Versatile Formal Analyzer of Neural Networks
Haoze Wu
Omri Isac
Aleksandar Zeljić
Teruhiro Tagomori
M. Daggitt
...
Min Wu
Min Zhang
Ekaterina Komendantskaya
Guy Katz
Clark W. Barrett
138
42
0
25 Jan 2024
CompactifAI: Extreme Compression of Large Language Models using
  Quantum-Inspired Tensor Networks
CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks
Andrei Tomut
S. Jahromi
Abhijoy Sarkar
Uygar Kurt
Sukhbinder Singh
...
Muhammad Ibrahim
Oussama Tahiri-Alaoui
John Malcolm
Samuel Mugel
Roman Orus
MQ
127
16
0
25 Jan 2024
AdCorDA: Classifier Refinement via Adversarial Correction and Domain
  Adaptation
AdCorDA: Classifier Refinement via Adversarial Correction and Domain Adaptation
Lulan Shen
Ali Edalati
Brett H. Meyer
Warren Gross
James J. Clark
65
0
0
24 Jan 2024
Robustness to distribution shifts of compressed networks for edge
  devices
Robustness to distribution shifts of compressed networks for edge devices
Lulan Shen
Ali Edalati
Brett H. Meyer
Warren Gross
James J. Clark
77
0
0
22 Jan 2024
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning
Chu Myaet Thwal
Minh N. H. Nguyen
Ye Lin Tun
Seongjin Kim
My T. Thai
Choong Seon Hong
125
7
0
22 Jan 2024
Dynamic Q&A of Clinical Documents with Large Language Models
Dynamic Q&A of Clinical Documents with Large Language Models
Ran Elgedawy
Ioana Danciu
Maria Mahbub
Sudarshan Srinivasan
LM&MA
37
6
0
19 Jan 2024
A2Q+: Improving Accumulator-Aware Weight Quantization
A2Q+: Improving Accumulator-Aware Weight Quantization
Ian Colbert
Alessandro Pappalardo
Jakoba Petri-Koenig
Yaman Umuroglu
MQ
71
5
0
19 Jan 2024
Previous
123...678...242526
Next