ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05877
  4. Cited By
Quantization and Training of Neural Networks for Efficient
  Integer-Arithmetic-Only Inference

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

15 December 2017
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
    MQ
ArXivPDFHTML

Papers citing "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"

50 / 1,256 papers shown
Title
Temporal Feature Matters: A Framework for Diffusion Model Quantization
Temporal Feature Matters: A Framework for Diffusion Model Quantization
Yushi Huang
Ruihao Gong
Xianglong Liu
Jing Liu
Yuhang Li
Jiwen Lu
Dacheng Tao
DiffM
MQ
49
0
0
28 Jul 2024
Mixed Non-linear Quantization for Vision Transformers
Mixed Non-linear Quantization for Vision Transformers
Gihwan Kim
Jemin Lee
Sihyeong Park
Yongin Kwon
Hyungshin Kim
MQ
40
0
0
26 Jul 2024
Comprehensive Study on Performance Evaluation and Optimization of Model
  Compression: Bridging Traditional Deep Learning and Large Language Models
Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models
Aayush Saxena
Arit Kumar Bishwas
Ayush Ashok Mishra
Ryan Armstrong
21
1
0
22 Jul 2024
Inverted Activations
Inverted Activations
Georgii Sergeevich Novikov
Ivan Oseledets
26
0
0
22 Jul 2024
StreamTinyNet: video streaming analysis with spatial-temporal TinyML
StreamTinyNet: video streaming analysis with spatial-temporal TinyML
Hazem Hesham Yousef Shalby
Massimo Pavan
Manuel Roveri
45
0
0
22 Jul 2024
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Yifei Gao
Jie Ou
Lei Wang
Fanhua Shang
Jaji Wu
MQ
55
0
0
22 Jul 2024
Toward Efficient Convolutional Neural Networks With Structured Ternary
  Patterns
Toward Efficient Convolutional Neural Networks With Structured Ternary Patterns
Christos Kyrkou
42
0
0
20 Jul 2024
Mixed-precision Neural Networks on RISC-V Cores: ISA extensions for
  Multi-Pumped Soft SIMD Operations
Mixed-precision Neural Networks on RISC-V Cores: ISA extensions for Multi-Pumped Soft SIMD Operations
Giorgos Armeniakos
Alexis Maras
S. Xydis
Dimitrios Soudris
MQ
26
3
0
19 Jul 2024
Mamba-PTQ: Outlier Channels in Recurrent Large Language Models
Mamba-PTQ: Outlier Channels in Recurrent Large Language Models
Alessandro Pierro
Steven Abreu
MQ
Mamba
45
6
0
17 Jul 2024
NITRO-D: Native Integer-only Training of Deep Convolutional Neural
  Networks
NITRO-D: Native Integer-only Training of Deep Convolutional Neural Networks
Alberto Pirillo
Luca Colombo
Manuel Roveri
MQ
29
0
0
16 Jul 2024
QVD: Post-training Quantization for Video Diffusion Models
QVD: Post-training Quantization for Video Diffusion Models
Shilong Tian
Hong Chen
Chengtao Lv
Yu Liu
Jinyang Guo
Xianglong Liu
Shengxi Li
Hao Yang
Tao Xie
VGen
MQ
46
3
0
16 Jul 2024
On-Device Training of Fully Quantized Deep Neural Networks on Cortex-M
  Microcontrollers
On-Device Training of Fully Quantized Deep Neural Networks on Cortex-M Microcontrollers
M. Deutel
Frank Hannig
Christopher Mutschler
Jürgen Teich
MQ
30
0
0
15 Jul 2024
A Bag of Tricks for Scaling CPU-based Deep FFMs to more than 300m
  Predictions per Second
A Bag of Tricks for Scaling CPU-based Deep FFMs to more than 300m Predictions per Second
Blaž Škrlj
Benjamin Ben-Shalom
Grega Gaspersic
Adi Schwartz
Ramzi Hoseisi
Naama Ziporin
Davorin Kopic
Andraz Tori
45
0
0
14 Jul 2024
Inference Optimization of Foundation Models on AI Accelerators
Inference Optimization of Foundation Models on AI Accelerators
Youngsuk Park
Kailash Budhathoki
Liangfu Chen
Jonas M. Kübler
Jiaji Huang
Matthäus Kleindessner
Jun Huan
V. Cevher
Yida Wang
George Karypis
45
3
0
12 Jul 2024
Optimization of DNN-based speaker verification model through efficient
  quantization technique
Optimization of DNN-based speaker verification model through efficient quantization technique
Yeona Hong
Woo-Jin Chung
Hong-Goo Kang
MQ
31
1
0
12 Jul 2024
Real-Time Anomaly Detection and Reactive Planning with Large Language
  Models
Real-Time Anomaly Detection and Reactive Planning with Large Language Models
Rohan Sinha
Amine Elhafsi
Christopher Agia
Matthew Foutter
Edward Schmerling
Marco Pavone
OffRL
LRM
45
26
0
11 Jul 2024
DεpS: Delayed ε-Shrinking for Faster Once-For-All
  Training
DεpS: Delayed ε-Shrinking for Faster Once-For-All Training
Aditya Annavajjala
Alind Khare
Animesh Agrawal
Igor Fedorov
Hugo Latapie
Myungjin Lee
Alexey Tumanov
CLL
42
0
0
08 Jul 2024
On the Limitations of Compute Thresholds as a Governance Strategy
On the Limitations of Compute Thresholds as a Governance Strategy
Sara Hooker
58
14
0
08 Jul 2024
OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks
OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks
Jingyang Xiang
Zuohui Chen
Siqi Li
Qing Wu
Yong-Jin Liu
28
1
0
07 Jul 2024
ZOBNN: Zero-Overhead Dependable Design of Binary Neural Networks with
  Deliberately Quantized Parameters
ZOBNN: Zero-Overhead Dependable Design of Binary Neural Networks with Deliberately Quantized Parameters
B. Ghavami
M. Shahidzadeh
Lesley Shannon
S. Wilton
55
0
0
06 Jul 2024
The Impact of Quantization and Pruning on Deep Reinforcement Learning
  Models
The Impact of Quantization and Pruning on Deep Reinforcement Learning Models
Heng Lu
Mehdi Alemi
Reza Rawassizadeh
42
1
0
05 Jul 2024
Resource-Efficient Speech Quality Prediction through Quantization Aware
  Training and Binary Activation Maps
Resource-Efficient Speech Quality Prediction through Quantization Aware Training and Binary Activation Maps
Mattias Nilsson
Riccardo Miccini
Clément Laroche
Tobias Piechowiak
Friedemann Zenke
MQ
34
0
0
05 Jul 2024
ISQuant: apply squant to the real deployment
ISQuant: apply squant to the real deployment
Dezan Zhao
MQ
27
0
0
05 Jul 2024
Gaussian Eigen Models for Human Heads
Gaussian Eigen Models for Human Heads
Wojciech Zielonka
Timo Bolkart
Thabo Beeler
Justus Thies
3DGS
49
5
0
05 Jul 2024
Timestep-Aware Correction for Quantized Diffusion Models
Timestep-Aware Correction for Quantized Diffusion Models
Yuzhe Yao
Feng Tian
Jun Chen
Haonan Lin
Guang Dai
Yong Liu
Jingdong Wang
DiffM
MQ
46
5
0
04 Jul 2024
Protecting Deep Learning Model Copyrights with Adversarial Example-Free
  Reuse Detection
Protecting Deep Learning Model Copyrights with Adversarial Example-Free Reuse Detection
Xiaokun Luan
Xiyue Zhang
Jingyi Wang
Meng Sun
AAML
23
0
0
04 Jul 2024
Improving Conversational Abilities of Quantized Large Language Models
  via Direct Preference Alignment
Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment
Janghwan Lee
Seongmin Park
S. Hong
Minsoo Kim
Du-Seong Chang
Jungwook Choi
37
4
0
03 Jul 2024
ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization
  for Vision Transformers
ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers
Yanfeng Jiang
Ning Sun
Xueshuo Xie
Fei Yang
Tao Li
MQ
44
2
0
03 Jul 2024
CatMemo at the FinLLM Challenge Task: Fine-Tuning Large Language Models
  using Data Fusion in Financial Applications
CatMemo at the FinLLM Challenge Task: Fine-Tuning Large Language Models using Data Fusion in Financial Applications
Yupeng Cao
Zhiyuan Yao
Zhi Chen
Zhiyang Deng
28
1
0
02 Jul 2024
A Comprehensive Survey on Diffusion Models and Their Applications
A Comprehensive Survey on Diffusion Models and Their Applications
M. Ahsan
S. Raman
Yingtao Liu
Zahed Siddique
MedIm
DiffM
41
1
0
01 Jul 2024
Joint Pruning and Channel-wise Mixed-Precision Quantization for
  Efficient Deep Neural Networks
Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks
Beatrice Alessandra Motetti
Matteo Risso
Alessio Burrello
Enrico Macii
M. Poncino
Daniele Jahier Pagliari
MQ
58
2
0
01 Jul 2024
VcLLM: Video Codecs are Secretly Tensor Codecs
VcLLM: Video Codecs are Secretly Tensor Codecs
Ceyu Xu
Yongji Wu
Xinyu Yang
Beidi Chen
Matthew Lentz
Danyang Zhuo
Lisa Wu Wills
50
0
0
29 Jun 2024
SCOPE: Stochastic Cartographic Occupancy Prediction Engine for
  Uncertainty-Aware Dynamic Navigation
SCOPE: Stochastic Cartographic Occupancy Prediction Engine for Uncertainty-Aware Dynamic Navigation
Zhanteng Xie
P. Dames
39
1
0
28 Jun 2024
OutlierTune: Efficient Channel-Wise Quantization for Large Language
  Models
OutlierTune: Efficient Channel-Wise Quantization for Large Language Models
Jinguang Wang
Yuexi Yin
Haifeng Sun
Qi Qi
Jingyu Wang
Zirui Zhuang
Tingting Yang
Jianxin Liao
46
2
0
27 Jun 2024
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
Lei Chen
Yuan Meng
Chen Tang
Xinzhu Ma
Jingyan Jiang
Xin Wang
Zhi Wang
Wenwu Zhu
MQ
31
23
0
25 Jun 2024
TRAWL: Tensor Reduced and Approximated Weights for Large Language Models
TRAWL: Tensor Reduced and Approximated Weights for Large Language Models
Yiran Luo
Het Patel
Yu Fu
Dawon Ahn
Jia Chen
Yue Dong
Evangelos E. Papalexakis
41
1
0
25 Jun 2024
Evaluation of Language Models in the Medical Context Under
  Resource-Constrained Settings
Evaluation of Language Models in the Medical Context Under Resource-Constrained Settings
Andrea Posada
Daniel Rueckert
Felix Meissen
Philip Muller
LM&MA
ELM
37
0
0
24 Jun 2024
Compensate Quantization Errors: Make Weights Hierarchical to Compensate
  Each Other
Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other
Yifei Gao
Jie Ou
Lei Wang
Yuting Xiao
Zhiyuan Xiang
Ruiting Dai
Jun Cheng
MQ
36
3
0
24 Jun 2024
MetaGreen: Meta-Learning Inspired Transformer Selection for Green
  Semantic Communication
MetaGreen: Meta-Learning Inspired Transformer Selection for Green Semantic Communication
Shubhabrata Mukherjee
Cory Beard
Sejun Song
43
0
0
22 Jun 2024
Prefixing Attention Sinks can Mitigate Activation Outliers for Large
  Language Model Quantization
Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization
Seungwoo Son
Wonpyo Park
Woohyun Han
Kyuyeun Kim
Jaeho Lee
MQ
37
10
0
17 Jun 2024
Save It All: Enabling Full Parameter Tuning for Federated Large Language
  Models via Cycle Block Gradient Descent
Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent
Lin Wang
Zhichao Wang
Xiaoying Tang
45
1
0
17 Jun 2024
InstructCMP: Length Control in Sentence Compression through
  Instruction-based Large Language Models
InstructCMP: Length Control in Sentence Compression through Instruction-based Large Language Models
Juseon-Do
Jingun Kwon
Hidetaka Kamigaito
Manabu Okumura
34
2
0
16 Jun 2024
Outlier Reduction with Gated Attention for Improved Post-training
  Quantization in Large Sequence-to-sequence Speech Foundation Models
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
Dominik Wagner
Ilja Baumann
Korbinian Riedhammer
Tobias Bocklet
MQ
32
1
0
16 Jun 2024
Tender: Accelerating Large Language Models via Tensor Decomposition and
  Runtime Requantization
Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization
Jungi Lee
Wonbeom Lee
Jaewoong Sim
MQ
42
14
0
16 Jun 2024
Memory Faults in Activation-sparse Quantized Deep Neural Networks:
  Analysis and Mitigation using Sharpness-aware Training
Memory Faults in Activation-sparse Quantized Deep Neural Networks: Analysis and Mitigation using Sharpness-aware Training
Akul Malhotra
S. Gupta
13
0
0
15 Jun 2024
PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with
  Knowledge Distillation
PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation
Injoon Hwang
Haewon Park
Youngwan Lee
Jooyoung Yang
SunJae Maeng
AI4CE
16
0
0
13 Jun 2024
ME-Switch: A Memory-Efficient Expert Switching Framework for Large
  Language Models
ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models
Jing Liu
Ruihao Gong
Mingyang Zhang
Yefei He
Jianfei Cai
Bohan Zhuang
MoE
45
0
0
13 Jun 2024
Memory Is All You Need: An Overview of Compute-in-Memory Architectures
  for Accelerating Large Language Model Inference
Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference
Christopher Wolters
Xiaoxuan Yang
Ulf Schlichtmann
Toyotaro Suzumura
39
11
0
12 Jun 2024
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization
Jiaxin Deng
Junbiao Pang
Baochang Zhang
66
1
0
12 Jun 2024
Markov Constraint as Large Language Model Surrogate
Markov Constraint as Large Language Model Surrogate
Alexandre Bonlarron
Jean-Charles Régin
32
1
0
11 Jun 2024
Previous
12345...242526
Next