ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.08295
  4. Cited By
A White Paper on Neural Network Quantization

A White Paper on Neural Network Quantization

15 June 2021
Markus Nagel
Marios Fournarakis
Rana Ali Amjad
Yelysei Bondarenko
M. V. Baalen
Tijmen Blankevoort
    MQ
ArXivPDFHTML

Papers citing "A White Paper on Neural Network Quantization"

50 / 247 papers shown
Title
Opportunities and Challenges of Generative-AI in Finance
Opportunities and Challenges of Generative-AI in Finance
Akshar Prabhu Desai
Ganesh Satish Mallya
Mohammad Luqman
Tejasvi Ravi
Nithya Kota
Pranjul Yadav
AIFin
42
2
0
21 Oct 2024
Lossless KV Cache Compression to 2%
Lossless KV Cache Compression to 2%
Zhen Yang
Jizong Han
Kan Wu
Ruobing Xie
An Wang
Xingchen Sun
Zhanhui Kang
VLM
MQ
36
2
0
20 Oct 2024
Active-Dormant Attention Heads: Mechanistically Demystifying
  Extreme-Token Phenomena in LLMs
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo
Druv Pai
Yu Bai
Jiantao Jiao
Michael I. Jordan
Song Mei
29
10
0
17 Oct 2024
Harnessing Your DRAM and SSD for Sustainable and Accessible LLM
  Inference with Mixed-Precision and Multi-level Caching
Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching
Jie Peng
Zhang Cao
Huaizhi Qu
Zhengyu Zhang
Chang Guo
Yanyong Zhang
Zhichao Cao
Tianlong Chen
34
2
0
17 Oct 2024
Error Diffusion: Post Training Quantization with Block-Scaled Number
  Formats for Neural Networks
Error Diffusion: Post Training Quantization with Block-Scaled Number Formats for Neural Networks
Alireza Khodamoradi
K. Denolf
Eric Dellinger
MQ
39
0
0
15 Oct 2024
MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI
MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI
Arya Tschand
Arun Tejusve Raghunath Rajan
S. Idgunji
Anirban Ghosh
J. Holleman
...
Rowan Taubitz
Sean Zhan
Scott Wasson
David Kanter
Vijay Janapa Reddi
62
3
0
15 Oct 2024
SLaNC: Static LayerNorm Calibration
SLaNC: Static LayerNorm Calibration
Mahsa Salmani
Nikita Trukhanov
I. Soloveychik
MQ
31
0
0
14 Oct 2024
Towards Reproducible Learning-based Compression
Towards Reproducible Learning-based Compression
Jiahao Pang
M. Lodhi
Junghyun Ahn
Yuning Huang
Dong Tian
16
1
0
13 Oct 2024
Continuous Approximations for Improving Quantization Aware Training of
  LLMs
Continuous Approximations for Improving Quantization Aware Training of LLMs
He Li
Jianhang Hong
Yuanzhuo Wu
Snehal Adbol
Zonglin Li
MQ
23
1
0
06 Oct 2024
Resource-aware Mixed-precision Quantization for Enhancing Deployability
  of Transformers for Time-series Forecasting on Embedded FPGAs
Resource-aware Mixed-precision Quantization for Enhancing Deployability of Transformers for Time-series Forecasting on Embedded FPGAs
Tianheng Ling
Chao Qian
Gregor Schiele
26
0
0
04 Oct 2024
Constraint Guided Model Quantization of Neural Networks
Constraint Guided Model Quantization of Neural Networks
Quinten Van Baelen
P. Karsmakers
MQ
28
0
0
30 Sep 2024
Accumulator-Aware Post-Training Quantization
Accumulator-Aware Post-Training Quantization
Ian Colbert
Fabian Grob
Giuseppe Franco
Jinjie Zhang
Rayan Saab
MQ
30
3
0
25 Sep 2024
PTQ4RIS: Post-Training Quantization for Referring Image Segmentation
PTQ4RIS: Post-Training Quantization for Referring Image Segmentation
Xiaoyan Jiang
Hang Yang
Kaiying Zhu
Xihe Qiu
Shibo Zhao
Sifan Zhou
MQ
26
0
0
25 Sep 2024
Floating-floating point: a highly accurate number representation with
  flexible Counting ranges
Floating-floating point: a highly accurate number representation with flexible Counting ranges
Itamar Cohen
Gil Einziger
26
0
0
22 Sep 2024
Compressing VAE-Based Out-of-Distribution Detectors for Embedded
  Deployment
Compressing VAE-Based Out-of-Distribution Detectors for Embedded Deployment
Aditya Bansal
Michael Yuhas
Arvind Easwaran
OODD
26
0
0
02 Sep 2024
Accurate Compression of Text-to-Image Diffusion Models via Vector
  Quantization
Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization
Vage Egiazarian
Denis Kuznedelev
Anton Voronov
Ruslan Svirschevski
Michael Goin
Daniil Pavlov
Dan Alistarh
Dmitry Baranchuk
MQ
39
0
0
31 Aug 2024
On-device AI: Quantization-aware Training of Transformers in Time-Series
On-device AI: Quantization-aware Training of Transformers in Time-Series
Tianheng Ling
Gregor Schiele
AI4TS
19
1
0
29 Aug 2024
DCT-CryptoNets: Scaling Private Inference in the Frequency Domain
DCT-CryptoNets: Scaling Private Inference in the Frequency Domain
Arjun Roy
Kaushik Roy
122
1
0
27 Aug 2024
Low-Bitwidth Floating Point Quantization for Efficient High-Quality
  Diffusion Models
Low-Bitwidth Floating Point Quantization for Efficient High-Quality Diffusion Models
Cheng Chen
Christina Giannoula
Andreas Moshovos
DiffM
MQ
24
0
0
13 Aug 2024
Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
Róisín Luo
Alexandru Drimbarean
Walsh Simon
Colm O'Riordan
MQ
37
0
0
01 Aug 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like
  Spontaneous Representation
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Xinhan Di
Jiahao Lu
Yunming Liang
Junjie Zheng
Yihua Wang
Chaofan Ding
ALM
35
1
0
01 Aug 2024
TinyChirp: Bird Song Recognition Using TinyML Models on Low-power
  Wireless Acoustic Sensors
TinyChirp: Bird Song Recognition Using TinyML Models on Low-power Wireless Acoustic Sensors
Zhaolan Huang
Adrien Tousnakhoff
Polina Kozyr
Roman Rehausen
Felix Biessmann
Robert Lachlan
C. Adjih
Emmanuel Baccelli
39
1
0
31 Jul 2024
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
Kanghyun Choi
Hyeyoon Lee
Dain Kwon
Sunjong Park
Kyuyeun Kim
Noseong Park
Jinho Lee
Jinho Lee
MQ
48
1
0
29 Jul 2024
StreamTinyNet: video streaming analysis with spatial-temporal TinyML
StreamTinyNet: video streaming analysis with spatial-temporal TinyML
Hazem Hesham Yousef Shalby
Massimo Pavan
Manuel Roveri
42
0
0
22 Jul 2024
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Yifei Gao
Jie Ou
Lei Wang
Fanhua Shang
Jaji Wu
MQ
47
0
0
22 Jul 2024
Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of
  Learnable Binary Vectors
Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors
Matt Gorbett
Hossein Shirazi
Indrakshi Ray
MQ
43
0
0
16 Jul 2024
Exploring Quantization for Efficient Pre-Training of Transformer
  Language Models
Exploring Quantization for Efficient Pre-Training of Transformer Language Models
Kamran Chitsaz
Quentin Fournier
Gonccalo Mordido
Sarath Chandar
MQ
49
3
0
16 Jul 2024
QVD: Post-training Quantization for Video Diffusion Models
QVD: Post-training Quantization for Video Diffusion Models
Shilong Tian
Hong Chen
Chengtao Lv
Yu Liu
Jinyang Guo
Xianglong Liu
Shengxi Li
Hao Yang
Tao Xie
VGen
MQ
46
3
0
16 Jul 2024
Quantized Prompt for Efficient Generalization of Vision-Language Models
Quantized Prompt for Efficient Generalization of Vision-Language Models
Tianxiang Hao
Xiaohan Ding
Juexiao Feng
Yuhong Yang
Hui Chen
Guiguang Ding
VLM
MQ
32
5
0
15 Jul 2024
Optimization of DNN-based speaker verification model through efficient
  quantization technique
Optimization of DNN-based speaker verification model through efficient quantization technique
Yeona Hong
Woo-Jin Chung
Hong-Goo Kang
MQ
26
1
0
12 Jul 2024
Trainable Highly-expressive Activation Functions
Trainable Highly-expressive Activation Functions
Irit Chelly
Shahaf E. Finder
Shira Ifergane
O. Freifeld
31
4
0
10 Jul 2024
Pruning Large Language Models to Intra-module Low-rank Architecture with
  Transitional Activations
Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations
Bowen Shen
Zheng-Shen Lin
Daren Zha
Wei Liu
Jian Luan
Bin Wang
Weiping Wang
60
1
0
08 Jul 2024
Timestep-Aware Correction for Quantized Diffusion Models
Timestep-Aware Correction for Quantized Diffusion Models
Yuzhe Yao
Feng Tian
Jun Chen
Haonan Lin
Guang Dai
Yong Liu
Jingdong Wang
DiffM
MQ
46
5
0
04 Jul 2024
Low-latency machine learning FPGA accelerator for multi-qubit-state
  discrimination
Low-latency machine learning FPGA accelerator for multi-qubit-state discrimination
P. Gautam
Shantharam Kalipatnapu
Shankaranarayanan H
Ujjawal Singhal
Benjamin Lienhard
Vibhor Singh
Chetan Singh Thakur
21
1
0
04 Jul 2024
ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization
  for Vision Transformers
ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers
Yanfeng Jiang
Ning Sun
Xueshuo Xie
Fei Yang
Tao Li
MQ
38
2
0
03 Jul 2024
Exploring FPGA designs for MX and beyond
Exploring FPGA designs for MX and beyond
Ebby Samson
Naveen Mellempudi
Wayne Luk
G. Constantinides
MQ
30
1
0
01 Jul 2024
Neural Texture Block Compression
Neural Texture Block Compression
S. Fujieda
Takahiro Harada
34
0
0
27 Jun 2024
Compensate Quantization Errors: Make Weights Hierarchical to Compensate
  Each Other
Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other
Yifei Gao
Jie Ou
Lei Wang
Yuting Xiao
Zhiyuan Xiang
Ruiting Dai
Jun Cheng
MQ
36
3
0
24 Jun 2024
xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned
  MT Evaluation Metrics
xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned MT Evaluation Metrics
Daniil Larionov
Mikhail Seleznyov
Vasiliy Viskov
Alexander Panchenko
Steffen Eger
37
3
0
20 Jun 2024
FLoCoRA: Federated learning compression with low-rank adaptation
FLoCoRA: Federated learning compression with low-rank adaptation
Lucas Grativol Ribeiro
Mathieu Léonardon
Guillaume Muller
Virginie Fresse
Matthieu Arzel
AI4CE
37
1
0
20 Jun 2024
Optimized Speculative Sampling for GPU Hardware Accelerators
Optimized Speculative Sampling for GPU Hardware Accelerators
Dominik Wagner
Seanie Lee
Ilja Baumann
Philipp Seeberger
K. Riedhammer
Tobias Bocklet
48
3
0
16 Jun 2024
Evaluating the Generalization Ability of Quantized LLMs: Benchmark,
  Analysis, and Toolbox
Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox
Yijun Liu
Yuan Meng
Fang Wu
Shenhao Peng
Hang Yao
Chaoyu Guan
Chen Tang
Xinzhu Ma
Zhi Wang
Wenwu Zhu
MQ
58
7
0
15 Jun 2024
One-pass Multiple Conformer and Foundation Speech Systems Compression
  and Quantization Using An All-in-one Neural Model
One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model
Zhaoqing Li
Haoning Xu
Tianzi Wang
Shoukang Hu
Zengrui Jin
Shujie Hu
Jiajun Deng
Mingyu Cui
Mengzhe Geng
Xunying Liu
MQ
37
1
0
14 Jun 2024
Low-Rank Quantization-Aware Training for LLMs
Low-Rank Quantization-Aware Training for LLMs
Yelysei Bondarenko
Riccardo Del Chiaro
Markus Nagel
MQ
33
10
0
10 Jun 2024
Binarized Diffusion Model for Image Super-Resolution
Binarized Diffusion Model for Image Super-Resolution
Zheng Chen
Haotong Qin
Yong Guo
Xiongfei Su
Xin Yuan
Linghe Kong
Yulun Zhang
DiffM
34
9
0
09 Jun 2024
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
Yang Sui
Yanyu Li
Anil Kag
Yerlan Idelbayev
Junli Cao
Ju Hu
Dhritiman Sagar
Bo Yuan
Sergey Tulyakov
Jian Ren
MQ
41
18
0
06 Jun 2024
Loki: Low-Rank Keys for Efficient Sparse Attention
Loki: Low-Rank Keys for Efficient Sparse Attention
Prajwal Singhania
Siddharth Singh
Shwai He
S. Feizi
A. Bhatele
34
13
0
04 Jun 2024
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Philip Anastassiou
Jiawei Chen
J. Chen
Yuanzhe Chen
Zhuo Chen
...
Wenjie Zhang
Yuhang Zhang
Zilin Zhao
Dejian Zhong
Xiaobin Zhuang
49
79
0
04 Jun 2024
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Tianchen Zhao
Tongcheng Fang
Haofeng Huang
Enshu Liu
Widyadewi Soedarmadji
...
Shengen Yan
Huazhong Yang
Xuefei Ning
Xuefei Ning
Yu Wang
MQ
VGen
112
25
0
04 Jun 2024
VeriSplit: Secure and Practical Offloading of Machine Learning Inferences across IoT Devices
VeriSplit: Secure and Practical Offloading of Machine Learning Inferences across IoT Devices
Han Zhang
Zifan Wang
Mihir Dhamankar
Matt Fredrikson
Yuvraj Agarwal
43
2
0
02 Jun 2024
Previous
12345
Next