ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.08295
  4. Cited By
A White Paper on Neural Network Quantization

A White Paper on Neural Network Quantization

15 June 2021
Markus Nagel
Marios Fournarakis
Rana Ali Amjad
Yelysei Bondarenko
M. V. Baalen
Tijmen Blankevoort
    MQ
ArXiv (abs)PDFHTML

Papers citing "A White Paper on Neural Network Quantization"

50 / 264 papers shown
Title
Deriving Coding-Specific Sub-Models from LLMs using Resource-Efficient Pruning
Deriving Coding-Specific Sub-Models from LLMs using Resource-Efficient Pruning
Laura Puccioni
Alireza Farshin
Mariano Scazzariello
Changjie Wang
Marco Chiesa
Dejan Kostic
48
0
0
10 Jan 2025
PTQ4VM: Post-Training Quantization for Visual Mamba
PTQ4VM: Post-Training Quantization for Visual Mamba
Jun-gyu Jin
Changhun Lee
Seonggon Kim
Eunhyeok Park
MQMamba
130
2
0
29 Dec 2024
PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards
  Algorithms and Models
PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models
Zining Wnag
Jinpei Guo
Ruihao Gong
Yang Yong
Aishan Liu
Yushi Huang
Jiaheng Liu
Xianglong Liu
106
2
0
10 Dec 2024
GAQAT: gradient-adaptive quantization-aware training for domain
  generalization
GAQAT: gradient-adaptive quantization-aware training for domain generalization
Jiacheng Jiang
Yuan Meng
Chen Tang
Han Yu
Qun Li
Zhi Wang
Wenwu Zhu
MQ
80
0
0
07 Dec 2024
Unifying KV Cache Compression for Large Language Models with LeanKV
Unifying KV Cache Compression for Large Language Models with LeanKV
Yanqi Zhang
Yuwei Hu
Runyuan Zhao
John C. S. Lui
Haibo Chen
MQ
286
7
0
04 Dec 2024
Behavior Backdoor for Deep Learning Models
Behavior Backdoor for Deep Learning Models
Jinqiao Wang
Pengfei Zhang
R. Tao
Jian Yang
Hao Liu
Xianglong Liu
Y. X. Wei
Yao Zhao
AAML
124
0
0
02 Dec 2024
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Marco Federici
Davide Belli
M. V. Baalen
Amir Jalalirad
Andrii Skliar
Bence Major
Markus Nagel
Paul N. Whatmough
208
1
0
02 Dec 2024
Rapid Deployment of Domain-specific Hyperspectral Image Processors with
  Application to Autonomous Driving
Rapid Deployment of Domain-specific Hyperspectral Image Processors with Application to Autonomous Driving
Jon Gutiérrez-Zaballa
Koldo Basterretxea
Javier Echanobe
Óscar Mata-Carballeira
M. Victoria Martínez
129
3
0
26 Nov 2024
LiteVAR: Compressing Visual Autoregressive Modelling with Efficient
  Attention and Quantization
LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization
Rui Xie
Tianchen Zhao
Zhihang Yuan
Rui Wan
Wenxi Gao
Zhenhua Zhu
Xuefei Ning
Yu Wang
VGenMQ
87
4
0
26 Nov 2024
MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
Yu Zhang
Ming Wang
Lancheng Zou
Wulong Liu
Hui-Ling Zhen
Mingxuan Yuan
Bei Yu
MQ
97
1
0
25 Nov 2024
Exploring the Robustness and Transferability of Patch-Based Adversarial Attacks in Quantized Neural Networks
Exploring the Robustness and Transferability of Patch-Based Adversarial Attacks in Quantized Neural Networks
Amira Guesmi
B. Ouni
Mohamed Bennai
AAML
144
0
0
22 Nov 2024
Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI
  Conversations
Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations
Igor Fedorov
Kate Plawiak
Lemeng Wu
Tarek Elgamal
Naveen Suda
...
Bilge Soran
Zacharie Delpierre Coudert
Rachad Alao
Raghuraman Krishnamoorthi
Vikas Chandra
106
5
0
18 Nov 2024
Stepping Forward on the Last Mile
Stepping Forward on the Last Mile
Chen Feng
Shaojie Zhuo
Xiaopeng Zhang
R. Ramakrishnan
Zhaocong Yuan
Andrew Zou Li
137
0
0
06 Nov 2024
Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM
  Safety Alignment
Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment
Jason Vega
Junsheng Huang
Gaokai Zhang
Hangoo Kang
Minjia Zhang
Gagandeep Singh
76
1
0
05 Nov 2024
Improving DNN Modularization via Activation-Driven Training
Improving DNN Modularization via Activation-Driven Training
Tuan Ngo
Abid Hassan
Saad Shafiq
Nenad Medvidovic
MoMe
72
0
0
01 Nov 2024
Data Generation for Hardware-Friendly Post-Training Quantization
Data Generation for Hardware-Friendly Post-Training Quantization
Lior Dikstein
Ariel Lapid
Arnon Netzer
H. Habi
MQ
480
0
0
29 Oct 2024
IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models
IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models
Hang Guo
Yawei Li
Tao Dai
Shu-Tao Xia
Luca Benini
MQ
127
2
0
29 Oct 2024
Meta-Learning for Speeding Up Large Model Inference in Decentralized
  Environments
Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments
Yuzhe Yang
Yipeng Du
Ahmad Farhan
Claudio Angione
Yue Zhao
Harry Yang
Fielding Johnston
James Buban
Patrick Colangelo
103
0
0
28 Oct 2024
Content-Aware Radiance Fields: Aligning Model Complexity with Scene
  Intricacy Through Learned Bitwidth Quantization
Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization
Wen Liu
Xue Xian Zheng
Jingyi Yu
Xin Lou
MQ
65
0
0
25 Oct 2024
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block
  Reconstruction
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
Yuhang Li
Priyadarshini Panda
MQ
68
1
0
24 Oct 2024
Opportunities and Challenges of Generative-AI in Finance
Opportunities and Challenges of Generative-AI in Finance
Akshar Prabhu Desai
Ganesh Satish Mallya
Mohammad Luqman
Tejasvi Ravi
Nithya Kota
Pranjul Yadav
AIFin
125
4
0
21 Oct 2024
Lossless KV Cache Compression to 2%
Lossless KV Cache Compression to 2%
Zhen Yang
Jizong Han
Kan Wu
Ruobing Xie
An Wang
Xingwu Sun
Zhanhui Kang
VLMMQ
75
2
0
20 Oct 2024
Active-Dormant Attention Heads: Mechanistically Demystifying
  Extreme-Token Phenomena in LLMs
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo
Druv Pai
Yu Bai
Jiantao Jiao
Michael I. Jordan
Song Mei
77
14
0
17 Oct 2024
Harnessing Your DRAM and SSD for Sustainable and Accessible LLM
  Inference with Mixed-Precision and Multi-level Caching
Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching
Jie Peng
Zhang Cao
Huaizhi Qu
Zhengyu Zhang
Chang Guo
Yanyong Zhang
Zhichao Cao
Tianlong Chen
102
2
0
17 Oct 2024
Error Diffusion: Post Training Quantization with Block-Scaled Number
  Formats for Neural Networks
Error Diffusion: Post Training Quantization with Block-Scaled Number Formats for Neural Networks
Alireza Khodamoradi
K. Denolf
Eric Dellinger
MQ
74
0
0
15 Oct 2024
MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI
MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI
Arya Tschand
Arun Tejusve Raghunath Rajan
S. Idgunji
Anirban Ghosh
J. Holleman
...
Rowan Taubitz
Sean Zhan
Scott Wasson
David Kanter
Vijay Janapa Reddi
130
3
0
15 Oct 2024
SLaNC: Static LayerNorm Calibration
SLaNC: Static LayerNorm Calibration
Mahsa Salmani
Nikita Trukhanov
I. Soloveychik
MQ
48
0
0
14 Oct 2024
Towards Reproducible Learning-based Compression
Towards Reproducible Learning-based Compression
Jiahao Pang
M. Lodhi
Junghyun Ahn
Yuning Huang
Dong Tian
21
1
0
13 Oct 2024
Continuous Approximations for Improving Quantization Aware Training of
  LLMs
Continuous Approximations for Improving Quantization Aware Training of LLMs
He Li
Jianhang Hong
Yuanzhuo Wu
Snehal Adbol
Zonglin Li
MQ
65
1
0
06 Oct 2024
Resource-aware Mixed-precision Quantization for Enhancing Deployability
  of Transformers for Time-series Forecasting on Embedded FPGAs
Resource-aware Mixed-precision Quantization for Enhancing Deployability of Transformers for Time-series Forecasting on Embedded FPGAs
Tianheng Ling
Chao Qian
Gregor Schiele
84
1
0
04 Oct 2024
Constraint Guided Model Quantization of Neural Networks
Constraint Guided Model Quantization of Neural Networks
Quinten Van Baelen
P. Karsmakers
MQ
59
0
0
30 Sep 2024
Accumulator-Aware Post-Training Quantization
Accumulator-Aware Post-Training Quantization
Ian Colbert
Fabian Grob
Giuseppe Franco
Jinjie Zhang
Rayan Saab
MQ
71
4
0
25 Sep 2024
PTQ4RIS: Post-Training Quantization for Referring Image Segmentation
PTQ4RIS: Post-Training Quantization for Referring Image Segmentation
Xiaoyan Jiang
Hang Yang
Kaiying Zhu
Xihe Qiu
Shibo Zhao
Sifan Zhou
MQ
36
0
0
25 Sep 2024
Floating-floating point: a highly accurate number representation with
  flexible Counting ranges
Floating-floating point: a highly accurate number representation with flexible Counting ranges
Itamar Cohen
Gil Einziger
36
0
0
22 Sep 2024
Compressing VAE-Based Out-of-Distribution Detectors for Embedded
  Deployment
Compressing VAE-Based Out-of-Distribution Detectors for Embedded Deployment
Aditya Bansal
Michael Yuhas
Arvind Easwaran
OODD
56
0
0
02 Sep 2024
Accurate Compression of Text-to-Image Diffusion Models via Vector
  Quantization
Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization
Vage Egiazarian
Denis Kuznedelev
Anton Voronov
Ruslan Svirschevski
Michael Goin
Daniil Pavlov
Dan Alistarh
Dmitry Baranchuk
MQ
77
0
0
31 Aug 2024
On-device AI: Quantization-aware Training of Transformers in Time-Series
On-device AI: Quantization-aware Training of Transformers in Time-Series
Tianheng Ling
Gregor Schiele
AI4TS
29
1
0
29 Aug 2024
DCT-CryptoNets: Scaling Private Inference in the Frequency Domain
DCT-CryptoNets: Scaling Private Inference in the Frequency Domain
Arjun Roy
Kaushik Roy
431
1
0
27 Aug 2024
Low-Bitwidth Floating Point Quantization for Efficient High-Quality
  Diffusion Models
Low-Bitwidth Floating Point Quantization for Efficient High-Quality Diffusion Models
Cheng Chen
Christina Giannoula
Andreas Moshovos
DiffMMQ
49
1
0
13 Aug 2024
Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
Róisín Luo
Alexandru Drimbarean
Walsh Simon
Colm O'Riordan
MQ
71
1
0
01 Aug 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like
  Spontaneous Representation
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Xinhan Di
Jiahao Lu
Yunming Liang
Junjie Zheng
Yihua Wang
Chaofan Ding
ALM
91
1
0
01 Aug 2024
TinyChirp: Bird Song Recognition Using TinyML Models on Low-power
  Wireless Acoustic Sensors
TinyChirp: Bird Song Recognition Using TinyML Models on Low-power Wireless Acoustic Sensors
Zhaolan Huang
Adrien Tousnakhoff
Polina Kozyr
Roman Rehausen
Felix Biessmann
Robert Lachlan
C. Adjih
Emmanuel Baccelli
114
2
0
31 Jul 2024
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
Kanghyun Choi
Hyeyoon Lee
Dain Kwon
Sunjong Park
Kyuyeun Kim
Noseong Park
Jinho Lee
Jinho Lee
MQ
125
2
0
29 Jul 2024
StreamTinyNet: video streaming analysis with spatial-temporal TinyML
StreamTinyNet: video streaming analysis with spatial-temporal TinyML
Hazem Hesham Yousef Shalby
Massimo Pavan
Manuel Roveri
80
1
0
22 Jul 2024
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Yifei Gao
Jie Ou
Lei Wang
Fanhua Shang
Jaji Wu
MQ
94
0
0
22 Jul 2024
Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of
  Learnable Binary Vectors
Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors
Matt Gorbett
Hossein Shirazi
Indrakshi Ray
MQ
114
0
0
16 Jul 2024
Exploring Quantization for Efficient Pre-Training of Transformer
  Language Models
Exploring Quantization for Efficient Pre-Training of Transformer Language Models
Kamran Chitsaz
Quentin Fournier
Gonccalo Mordido
Sarath Chandar
MQ
95
4
0
16 Jul 2024
QVD: Post-training Quantization for Video Diffusion Models
QVD: Post-training Quantization for Video Diffusion Models
Shilong Tian
Hong Chen
Chengtao Lv
Yu Liu
Jinyang Guo
Xianglong Liu
Shengxi Li
Hao Yang
Tao Xie
VGenMQ
87
4
0
16 Jul 2024
Quantized Prompt for Efficient Generalization of Vision-Language Models
Quantized Prompt for Efficient Generalization of Vision-Language Models
Tianxiang Hao
Xiaohan Ding
Juexiao Feng
Yuhong Yang
Hui Chen
Guiguang Ding
VLMMQ
94
5
0
15 Jul 2024
Optimization of DNN-based speaker verification model through efficient
  quantization technique
Optimization of DNN-based speaker verification model through efficient quantization technique
Yeona Hong
Woo-Jin Chung
Hong-Goo Kang
MQ
64
1
0
12 Jul 2024
Previous
123456
Next