ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.08295
  4. Cited By
A White Paper on Neural Network Quantization

A White Paper on Neural Network Quantization

15 June 2021
Markus Nagel
Marios Fournarakis
Rana Ali Amjad
Yelysei Bondarenko
M. V. Baalen
Tijmen Blankevoort
    MQ
ArXiv (abs)PDFHTML

Papers citing "A White Paper on Neural Network Quantization"

50 / 264 papers shown
Title
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models
Tianchen Zhao
Ke Hong
Xinhao Yang
Xuefeng Xiao
Huixia Li
...
Ruiqi Xie
Siqi Chen
Hongyu Zhu
Y. Zhang
Yu Wang
MQVGen
11
0
0
19 Jun 2025
SlotPi: Physics-informed Object-centric Reasoning Models
SlotPi: Physics-informed Object-centric Reasoning Models
Jian Li
Wan Han
Ning Lin
Yu-Liang Zhan
Ruizhi Chengze
...
Yi-Feng Zhang
Hongsheng Liu
Zidong Wang
Fan Yu
Hao Sun
OCLLRMAI4CE
114
0
0
12 Jun 2025
ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs
Dhruv Parikh
Viktor Prasanna
36
0
0
10 Jun 2025
Can Quantized Audio Language Models Perform Zero-Shot Spoofing Detection?
Can Quantized Audio Language Models Perform Zero-Shot Spoofing Detection?
Bikash Dutta
Rishabh Ranjan
Shyam Sathvik
Mayank Vatsa
Richa Singh
13
0
0
07 Jun 2025
EdgeProfiler: A Fast Profiling Framework for Lightweight LLMs on Edge Using Analytical Model
EdgeProfiler: A Fast Profiling Framework for Lightweight LLMs on Edge Using Analytical Model
Alyssa Pinnock
Shakya Jayakody
Kawsher A Roxy
Md Rubel Ahmed
23
0
0
06 Jun 2025
FPTQuant: Function-Preserving Transforms for LLM Quantization
Boris van Breugel
Yelysei Bondarenko
Paul N. Whatmough
Markus Nagel
MQ
92
0
0
05 Jun 2025
FPGA-Enabled Machine Learning Applications in Earth Observation: A Systematic Review
FPGA-Enabled Machine Learning Applications in Earth Observation: A Systematic Review
Cédric Léonard
Dirk Stober
Martin Schulz
94
0
0
04 Jun 2025
BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing
BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing
Masaya Kawamura
Takuya Hasumi
Yuma Shirahata
Ryuichi Yamamoto
MQ
35
0
0
04 Jun 2025
Assigning Distinct Roles to Quantized and Low-Rank Matrices Toward Optimal Weight Decomposition
Assigning Distinct Roles to Quantized and Low-Rank Matrices Toward Optimal Weight Decomposition
Yoonjun Cho
Soeun Kim
Dongjae Jeon
Kyelim Lee
Beomsoo Lee
Albert No
MQ
27
0
0
02 Jun 2025
GSCodec Studio: A Modular Framework for Gaussian Splat Compression
GSCodec Studio: A Modular Framework for Gaussian Splat Compression
Sicheng Li
Chengzhen Wu
H. Li
Xiang Gao
Yiyi Liao
Lu Yu
3DGS
55
0
0
02 Jun 2025
QuantFace: Low-Bit Post-Training Quantization for One-Step Diffusion Face Restoration
QuantFace: Low-Bit Post-Training Quantization for One-Step Diffusion Face Restoration
Jiatong Li
Libo Zhu
Haotong Qin
Jingkai Wang
Linghe Kong
Guihai Chen
Yulun Zhang
Xiaokang Yang
DiffMMQ
45
0
0
01 Jun 2025
INSIGHT: A Survey of In-Network Systems for Intelligent, High-Efficiency AI and Topology Optimization
INSIGHT: A Survey of In-Network Systems for Intelligent, High-Efficiency AI and Topology Optimization
Aleksandr Algazinov
Joydeep Chandra
Matt Laing
20
0
0
30 May 2025
Chameleon: A MatMul-Free Temporal Convolutional Network Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Data
Chameleon: A MatMul-Free Temporal Convolutional Network Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Data
Douwe den Blanken
Charlotte Frenkel
28
0
0
30 May 2025
Fusion Steering: Prompt-Specific Activation Control
Fusion Steering: Prompt-Specific Activation Control
Waldemar Chang
Alhassan Yasin
LLMSV
11
0
0
28 May 2025
Lightweight Embeddings with Graph Rewiring for Collaborative Filtering
Lightweight Embeddings with Graph Rewiring for Collaborative Filtering
Xurong Liang
Tong Chen
Wei Yuan
Hongzhi Yin
22
0
0
25 May 2025
Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing
Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing
Zhaoyuan Su
Tingfeng Lan
Zirui Wang
Juncheng Yang
Yue Cheng
14
0
0
24 May 2025
NQKV: A KV Cache Quantization Scheme Based on Normal Distribution Characteristics
NQKV: A KV Cache Quantization Scheme Based on Normal Distribution Characteristics
Zhihang Cai
Xingjun Zhang
Zhendong Tan
Zheng Wei
MQ
197
0
0
22 May 2025
Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs
Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs
Shmulik Markovich-Golan
Daniel Ohayon
Itay Niv
Yair Hanani
MQ
134
0
0
19 May 2025
Efficient Mixed Precision Quantization in Graph Neural Networks
Efficient Mixed Precision Quantization in Graph Neural Networks
Samir Moustafa
Nils M. Kriege
Wilfried Gansterer
GNNMQ
71
0
0
14 May 2025
Resource-Efficient Language Models: Quantization for Fast and Accessible Inference
Resource-Efficient Language Models: Quantization for Fast and Accessible Inference
Tollef Emil Jørgensen
MQ
95
0
0
13 May 2025
LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization
LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization
Seunghee Han
S. Choi
Joo-Young Kim
59
0
0
09 May 2025
DPQ-HD: Post-Training Compression for Ultra-Low Power Hyperdimensional Computing
DPQ-HD: Post-Training Compression for Ultra-Low Power Hyperdimensional Computing
Nilesh Prasad Pandey
Shriniwas Kulkarni
David Wang
Onat Gungor
Flavio Ponzina
T. Rosing
86
0
0
08 May 2025
Diffusion Model Quantization: A Review
Diffusion Model Quantization: A Review
Qian Zeng
Chenggong Hu
Mingli Song
Jie Song
MQ
97
0
0
08 May 2025
EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices
EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices
Arnab Sanyal
Prithwish Mukherjee
Gourav Datta
Sandeep P. Chinchali
MQ
410
0
0
05 May 2025
Quantizing Diffusion Models from a Sampling-Aware Perspective
Quantizing Diffusion Models from a Sampling-Aware Perspective
Qian Zeng
Jie Song
Yuanyu Wan
Huiqiong Wang
Mingli Song
DiffMMQ
119
1
0
04 May 2025
Pack-PTQ: Advancing Post-training Quantization of Neural Networks by Pack-wise Reconstruction
Pack-PTQ: Advancing Post-training Quantization of Neural Networks by Pack-wise Reconstruction
Changjun Li
Runqing Jiang
Zhuo Song
Pengpeng Yu
Ye Zhang
Yulan Guo
MQ
143
0
0
01 May 2025
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models
Yeona Hong
Hyewon Han
Woo-Jin Chung
Hong-Goo Kang
MQ
128
0
0
21 Apr 2025
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
Coleman Hooper
Charbel Sakr
Ben Keller
Rangharajan Venkatesan
Kurt Keutzer
Siyang Song
Brucek Khailany
MQ
95
0
0
19 Apr 2025
Can LLMs Revolutionize the Design of Explainable and Efficient TinyML Models?
Can LLMs Revolutionize the Design of Explainable and Efficient TinyML Models?
Christophe El Zeinaty
W. Hamidouche
Glenn Herrou
D. Ménard
Merouane Debbah
78
0
0
13 Apr 2025
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Yamato Arai
Yuma Ichikawa
MQ
107
0
0
13 Apr 2025
Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation
Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation
Sirine Arfa
Bernhard Vogginger
Chen Liu
Johannes Partzsch
Mark Schöne
Christian Mayr
81
0
0
09 Apr 2025
Achieving binary weight and activation for LLMs using Post-Training Quantization
Achieving binary weight and activation for LLMs using Post-Training Quantization
Siqing Song
Chuang Wang
Ruiqi Wang
Yi Yang
Xuyao Zhang
MQ
130
0
0
07 Apr 2025
FP4DiT: Towards Effective Floating Point Quantization for Diffusion Transformers
FP4DiT: Towards Effective Floating Point Quantization for Diffusion Transformers
Ruichen Chen
Keith G. Mills
Di Niu
MQ
148
0
0
19 Mar 2025
Mixed precision accumulation for neural network inference guided by componentwise forward error analysis
Mixed precision accumulation for neural network inference guided by componentwise forward error analysis
El-Mehdi El Arar
Silviu-Ioan Filip
Theo Mary
Elisa Riccietti
92
0
0
19 Mar 2025
A Systematic Review of ECG Arrhythmia Classification: Adherence to Standards, Fair Evaluation, and Embedded Feasibility
Guilherme Silva
Pedro H. L. Silva
Gladston J. P. Moreira
Vander L. S. Freitas
Jadson Gertrudes
Eduardo José da S. Luz
92
0
0
10 Mar 2025
Helios 2.0: A Robust, Ultra-Low Power Gesture Recognition System Optimised for Event-Sensor based Wearables
Prarthana Bhattacharyya
Joshua Mitton
Ryan Page
Owen Morgan
Oliver Powell
...
Kemi Jacobs
Paolo Baesso
Taru Muhonen
R. Vigars
Louis Berridge
78
0
0
10 Mar 2025
Towards Superior Quantization Accuracy: A Layer-sensitive Approach
Feng Zhang
Yanbin Liu
Weihua Li
Jie Lv
Xiaodan Wang
Q. Bai
MQ
77
0
0
09 Mar 2025
Security and Real-time FPGA integration for Learned Image Compression
Alaa Mazouz
Carl De Sousa Tria
Sumanta Chaudhuri
Attilio Fiandrotti
Marco Cagnanzzo
Mihai P. Mitrea
Enzo Tartaglione
74
1
0
06 Mar 2025
Q&C: When Quantization Meets Cache in Efficient Image Generation
Xin Ding
Xiaochen Li
Haotong Qin
Zhibo Chen
DiffMMQ
173
0
0
04 Mar 2025
Dendron: Enhancing Human Activity Recognition with On-Device TinyML Learning
Hazem Hesham Yousef Shalby
Manuel Roveri
182
0
0
03 Mar 2025
Climate And Resource Awareness is Imperative to Achieving Sustainable AI (and Preventing a Global AI Arms Race)
Climate And Resource Awareness is Imperative to Achieving Sustainable AI (and Preventing a Global AI Arms Race)
Pedram Bakhtiarifard
Pınar Tözün
Christian Igel
Raghavendra Selvan
119
0
0
27 Feb 2025
Phoeni6: a Systematic Approach for Evaluating the Energy Consumption of Neural Networks
Phoeni6: a Systematic Approach for Evaluating the Energy Consumption of Neural Networks
Antônio Oliveira-Filho
Wellington Silva-de-Souza
Carlos Alberto Valderrama Sakuyama
Samuel Xavier-de-Souza
163
0
0
25 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Jiaqi Zhao
Ming Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
246
2
0
18 Feb 2025
QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models
QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models
Jiajun Zhou
Yifan Yang
Kai Zhen
Ziyue Liu
Yequan Zhao
Ershad Banijamali
Athanasios Mouchtaris
Ngai Wong
Zheng Zhang
MQ
67
0
0
17 Feb 2025
Mitigating multiple single-event upsets during deep neural network inference using fault-aware training
Mitigating multiple single-event upsets during deep neural network inference using fault-aware training
Toon Vinck
Naïn Jonckers
Gert Dekkers
Jeffrey Prinzie
P. Karsmakers
AAMLAI4CE
156
0
0
13 Feb 2025
Finetuning and Quantization of EEG-Based Foundational BioSignal Models on ECG and PPG Data for Blood Pressure Estimation
Finetuning and Quantization of EEG-Based Foundational BioSignal Models on ECG and PPG Data for Blood Pressure Estimation
Bálint Tóth
Dominik Senti
T. Ingolfsson
Jeffrey Zweidler
Alexandre Elsig
Luca Benini
Yawei Li
65
1
0
10 Feb 2025
Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study
Eric Aubinais
Philippe Formont
Pablo Piantanida
Elisabeth Gassiat
112
1
0
10 Feb 2025
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization
Zechun Liu
Changsheng Zhao
Hanxian Huang
Sijia Chen
Jing Zhang
...
Yuandong Tian
Bilge Soran
Raghuraman Krishnamoorthi
Tijmen Blankevoort
Vikas Chandra
MQ
170
10
0
04 Feb 2025
PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization
PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization
Mengzhao Chen
Yi Liu
Jiahao Wang
Yi Bin
Wenqi Shao
Ping Luo
MQ
140
4
0
28 Jan 2025
FlexQuant: Elastic Quantization Framework for Locally Hosted LLM on Edge Devices
FlexQuant: Elastic Quantization Framework for Locally Hosted LLM on Edge Devices
Yuji Chai
Mujin Kwen
David Brooks
Gu-Yeon Wei
MQ
92
3
0
13 Jan 2025
123456
Next