ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.08295
  4. Cited By
A White Paper on Neural Network Quantization

A White Paper on Neural Network Quantization

15 June 2021
Markus Nagel
Marios Fournarakis
Rana Ali Amjad
Yelysei Bondarenko
M. V. Baalen
Tijmen Blankevoort
    MQ
ArXivPDFHTML

Papers citing "A White Paper on Neural Network Quantization"

50 / 247 papers shown
Title
Efficient Mixed Precision Quantization in Graph Neural Networks
Efficient Mixed Precision Quantization in Graph Neural Networks
Samir Moustafa
Nils M. Kriege
Wilfried Gansterer
GNN
MQ
35
0
0
14 May 2025
Resource-Efficient Language Models: Quantization for Fast and Accessible Inference
Resource-Efficient Language Models: Quantization for Fast and Accessible Inference
Tollef Emil Jørgensen
MQ
51
0
0
13 May 2025
LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization
LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization
Seunghee Han
S. Choi
J. Kim
26
0
0
09 May 2025
Diffusion Model Quantization: A Review
Diffusion Model Quantization: A Review
Qian Zeng
Chenggong Hu
Mingli Song
Jie Song
MQ
45
0
0
08 May 2025
DPQ-HD: Post-Training Compression for Ultra-Low Power Hyperdimensional Computing
DPQ-HD: Post-Training Compression for Ultra-Low Power Hyperdimensional Computing
Nilesh Prasad Pandey
Shriniwas Kulkarni
David Wang
Onat Gungor
Flavio Ponzina
T. Rosing
36
0
0
08 May 2025
EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices
EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices
Arnab Sanyal
Prithwish Mukherjee
Gourav Datta
Sandeep P. Chinchali
MQ
131
0
0
05 May 2025
Quantizing Diffusion Models from a Sampling-Aware Perspective
Quantizing Diffusion Models from a Sampling-Aware Perspective
Qian Zeng
Jie Song
Yuanyu Wan
Huiqiong Wang
Mingli Song
DiffM
MQ
88
1
0
04 May 2025
Pack-PTQ: Advancing Post-training Quantization of Neural Networks by Pack-wise Reconstruction
Pack-PTQ: Advancing Post-training Quantization of Neural Networks by Pack-wise Reconstruction
Changjun Li
Runqing Jiang
Zhuo Song
Pengpeng Yu
Ye Zhang
Yulan Guo
MQ
56
0
0
01 May 2025
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models
Yeona Hong
Hyewon Han
Woo-Jin Chung
Hong-Goo Kang
MQ
28
0
0
21 Apr 2025
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
Coleman Hooper
Charbel Sakr
Ben Keller
Rangharajan Venkatesan
Kurt Keutzer
Shri Kiran Srinivasan
Brucek Khailany
MQ
44
0
0
19 Apr 2025
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Yamato Arai
Yuma Ichikawa
MQ
31
0
0
13 Apr 2025
Can LLMs Revolutionize the Design of Explainable and Efficient TinyML Models?
Can LLMs Revolutionize the Design of Explainable and Efficient TinyML Models?
Christophe El Zeinaty
W. Hamidouche
Glenn Herrou
D. Ménard
Merouane Debbah
43
0
0
13 Apr 2025
Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation
Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation
Sirine Arfa
Bernhard Vogginger
Chen Liu
Johannes Partzsch
Mark Schöne
Christian Mayr
24
0
0
09 Apr 2025
Achieving binary weight and activation for LLMs using Post-Training Quantization
Achieving binary weight and activation for LLMs using Post-Training Quantization
Siqing Song
Chuang Wang
Ruiqi Wang
Yi Yang
Xuyao Zhang
MQ
26
0
0
07 Apr 2025
FP4DiT: Towards Effective Floating Point Quantization for Diffusion Transformers
FP4DiT: Towards Effective Floating Point Quantization for Diffusion Transformers
Ruichen Chen
Keith G. Mills
Di Niu
MQ
54
0
0
19 Mar 2025
Mixed precision accumulation for neural network inference guided by componentwise forward error analysis
Mixed precision accumulation for neural network inference guided by componentwise forward error analysis
El-Mehdi El Arar
Silviu-Ioan Filip
Theo Mary
Elisa Riccietti
52
0
0
19 Mar 2025
Helios 2.0: A Robust, Ultra-Low Power Gesture Recognition System Optimised for Event-Sensor based Wearables
Prarthana Bhattacharyya
Joshua Mitton
Ryan Page
Owen Morgan
Oliver Powell
...
Kemi Jacobs
Paolo Baesso
Taru Muhonen
R. Vigars
Louis Berridge
45
0
0
10 Mar 2025
A Systematic Review of ECG Arrhythmia Classification: Adherence to Standards, Fair Evaluation, and Embedded Feasibility
Guilherme Silva
Pedro H. L. Silva
Gladston J. P. Moreira
Vander L. S. Freitas
Jadson Gertrudes
Eduardo José da S. Luz
59
0
0
10 Mar 2025
Towards Superior Quantization Accuracy: A Layer-sensitive Approach
Feng Zhang
Yanbin Liu
Weihua Li
Jie Lv
Xiaodan Wang
Q. Bai
MQ
55
0
0
09 Mar 2025
Security and Real-time FPGA integration for Learned Image Compression
Alaa Mazouz
Carl De Sousa Tria
Sumanta Chaudhuri
A. Fiandrotti
Marco Cagnanzzo
Mihai P. Mitrea
Enzo Tartaglione
43
1
0
06 Mar 2025
Q&C: When Quantization Meets Cache in Efficient Image Generation
Xin Ding
X. Li
Haotong Qin
Zhibo Chen
DiffM
MQ
75
0
0
04 Mar 2025
Dendron: Enhancing Human Activity Recognition with On-Device TinyML Learning
Hazem Hesham Yousef Shalby
Manuel Roveri
53
0
0
03 Mar 2025
Climate And Resource Awareness is Imperative to Achieving Sustainable AI (and Preventing a Global AI Arms Race)
Climate And Resource Awareness is Imperative to Achieving Sustainable AI (and Preventing a Global AI Arms Race)
Pedram Bakhtiarifard
Pınar Tözün
Christian Igel
Raghavendra Selvan
57
0
0
27 Feb 2025
Phoeni6: a Systematic Approach for Evaluating the Energy Consumption of Neural Networks
Phoeni6: a Systematic Approach for Evaluating the Energy Consumption of Neural Networks
Antônio Oliveira-Filho
Wellington Silva-de-Souza
Carlos Alberto Valderrama Sakuyama
Samuel Xavier-de-Souza
68
0
0
25 Feb 2025
More for Keys, Less for Values: Adaptive KV Cache Quantization
More for Keys, Less for Values: Adaptive KV Cache Quantization
Mohsen Hariri
Lam Nguyen
Sixu Chen
Shaochen Zhong
Qifan Wang
Xia Hu
Xiaotian Han
V. Chaudhary
MQ
48
0
0
24 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
J. Zhao
Hao Wu
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
60
1
0
18 Feb 2025
QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models
QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models
Jiajun Zhou
Yifan Yang
Kai Zhen
Z. Liu
Yequan Zhao
Ershad Banijamali
Athanasios Mouchtaris
Ngai Wong
Zheng Zhang
MQ
41
0
0
17 Feb 2025
Mitigating multiple single-event upsets during deep neural network inference using fault-aware training
Mitigating multiple single-event upsets during deep neural network inference using fault-aware training
Toon Vinck
Naïn Jonckers
Gert Dekkers
Jeffrey Prinzie
P. Karsmakers
AAML
AI4CE
105
0
0
13 Feb 2025
Finetuning and Quantization of EEG-Based Foundational BioSignal Models on ECG and PPG Data for Blood Pressure Estimation
Finetuning and Quantization of EEG-Based Foundational BioSignal Models on ECG and PPG Data for Blood Pressure Estimation
Bálint Tóth
Dominik Senti
T. Ingolfsson
Jeffrey Zweidler
Alexandre Elsig
Luca Benini
Yawei Li
43
0
0
10 Feb 2025
Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study
Eric Aubinais
Philippe Formont
Pablo Piantanida
Elisabeth Gassiat
50
0
0
10 Feb 2025
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization
Zechun Liu
Changsheng Zhao
Hanxian Huang
Sijia Chen
Jing Zhang
...
Yuandong Tian
Bilge Soran
Raghuraman Krishnamoorthi
Tijmen Blankevoort
Vikas Chandra
MQ
73
3
0
04 Feb 2025
PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization
PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization
Mengzhao Chen
Yi Liu
Jiahao Wang
Yi Bin
Wenqi Shao
Ping Luo
MQ
61
2
0
28 Jan 2025
FlexQuant: Elastic Quantization Framework for Locally Hosted LLM on Edge Devices
FlexQuant: Elastic Quantization Framework for Locally Hosted LLM on Edge Devices
Yuji Chai
Mujin Kwen
David Brooks
Gu-Yeon Wei
MQ
41
3
0
13 Jan 2025
PTQ4VM: Post-Training Quantization for Visual Mamba
PTQ4VM: Post-Training Quantization for Visual Mamba
Younghyun Cho
Changhun Lee
Seonggon Kim
Eunhyeok Park
MQ
Mamba
43
2
0
29 Dec 2024
PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards
  Algorithms and Models
PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models
Zining Wnag
J. Guo
Ruihao Gong
Yang Yong
Aishan Liu
Yushi Huang
Jiaheng Liu
X. Liu
71
1
0
10 Dec 2024
Unifying KV Cache Compression for Large Language Models with LeanKV
Unifying KV Cache Compression for Large Language Models with LeanKV
Yanqi Zhang
Yuwei Hu
Runyuan Zhao
John C. S. Lui
Haibo Chen
MQ
136
5
0
04 Dec 2024
Behavior Backdoor for Deep Learning Models
Behavior Backdoor for Deep Learning Models
J. T. Wang
Pengfei Zhang
R. Tao
Jian Yang
Hao Liu
X. Liu
Y. X. Wei
Yao Zhao
AAML
80
0
0
02 Dec 2024
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Marco Federici
Davide Belli
M. V. Baalen
Amir Jalalirad
Andrii Skliar
Bence Major
Markus Nagel
Paul N. Whatmough
76
0
0
02 Dec 2024
Rapid Deployment of Domain-specific Hyperspectral Image Processors with
  Application to Autonomous Driving
Rapid Deployment of Domain-specific Hyperspectral Image Processors with Application to Autonomous Driving
Jon Gutiérrez-Zaballa
Koldo Basterretxea
Javier Echanobe
Óscar Mata-Carballeira
M. Victoria Martínez
84
3
0
26 Nov 2024
MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
Yu Zhang
Hao Wu
Lancheng Zou
Wulong Liu
Hui-Ling Zhen
M. Yuan
Bei Yu
MQ
74
1
0
25 Nov 2024
Exploring the Robustness and Transferability of Patch-Based Adversarial Attacks in Quantized Neural Networks
Exploring the Robustness and Transferability of Patch-Based Adversarial Attacks in Quantized Neural Networks
Amira Guesmi
B. Ouni
Muhammad Shafique
AAML
74
0
0
22 Nov 2024
Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI
  Conversations
Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations
Igor Fedorov
Kate Plawiak
Lemeng Wu
Tarek Elgamal
Naveen Suda
...
Bilge Soran
Zacharie Delpierre Coudert
Rachad Alao
Raghuraman Krishnamoorthi
Vikas Chandra
72
4
0
18 Nov 2024
Stepping Forward on the Last Mile
Stepping Forward on the Last Mile
Chen Feng
Shaojie Zhuo
Xiaopeng Zhang
R. Ramakrishnan
Zhaocong Yuan
Andrew Zou Li
41
0
0
06 Nov 2024
Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM
  Safety Alignment
Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment
Jason Vega
Junsheng Huang
Gaokai Zhang
Hangoo Kang
Minjia Zhang
Gagandeep Singh
36
0
0
05 Nov 2024
Improving DNN Modularization via Activation-Driven Training
Improving DNN Modularization via Activation-Driven Training
Tuan Ngo
Abid Hassan
Saad Shafiq
Nenad Medvidovic
MoMe
29
0
0
01 Nov 2024
IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models
IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models
Hang Guo
Yawei Li
Tao Dai
Shu-Tao Xia
Luca Benini
MQ
39
1
0
29 Oct 2024
Data Generation for Hardware-Friendly Post-Training Quantization
Data Generation for Hardware-Friendly Post-Training Quantization
Lior Dikstein
Ariel Lapid
Arnon Netzer
H. Habi
MQ
154
0
0
29 Oct 2024
Meta-Learning for Speeding Up Large Model Inference in Decentralized
  Environments
Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments
Yuzhe Yang
Yipeng Du
Ahmad Farhan
Claudio Angione
Yue Zhao
Harry Yang
Fielding Johnston
James Buban
Patrick Colangelo
29
0
0
28 Oct 2024
Content-Aware Radiance Fields: Aligning Model Complexity with Scene
  Intricacy Through Learned Bitwidth Quantization
Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization
Wei Liu
Xue Xian Zheng
Jingyi Yu
Xin Lou
MQ
31
0
0
25 Oct 2024
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block
  Reconstruction
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
Yuhang Li
Priyadarshini Panda
MQ
31
1
0
24 Oct 2024
12345
Next