Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2109.12948
Cited By
Understanding and Overcoming the Challenges of Efficient Transformer Quantization
27 September 2021
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Understanding and Overcoming the Challenges of Efficient Transformer Quantization"
50 / 96 papers shown
Title
Resource-Efficient Language Models: Quantization for Fast and Accessible Inference
Tollef Emil Jørgensen
MQ
51
0
0
13 May 2025
EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices
Arnab Sanyal
Prithwish Mukherjee
Gourav Datta
Sandeep P. Chinchali
MQ
131
0
0
05 May 2025
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
Sanjay Surendranath Girija
Shashank Kapoor
Lakshit Arora
Dipen Pradhan
Aman Raj
Ankit Shetgaonkar
57
0
0
05 May 2025
Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models
Lucas Maisonnave
Cyril Moineau
Olivier Bichler
Fabrice Rastello
MQ
71
1
0
30 Apr 2025
Compressing Language Models for Specialized Domains
Miles Williams
G. Chrysostomou
Vitor Jeronymo
Nikolaos Aletras
MQ
48
0
0
25 Feb 2025
Multi-stage Training of Bilingual Islamic LLM for Neural Passage Retrieval
Vera Pavlova
48
2
0
20 Jan 2025
Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models
Yanwen Huang
Yong Zhang
Ning Cheng
Zhitao Li
Shaojun Wang
Jing Xiao
86
0
0
02 Jan 2025
The Super Weight in Large Language Models
Mengxia Yu
De Wang
Qi Shan
Colorado Reed
Alvin Wan
MQ
MILM
39
9
0
11 Nov 2024
Building an Efficient Multilingual Non-Profit IR System for the Islamic Domain Leveraging Multiprocessing Design in Rust
Vera Pavlova
Mohammed Makhlouf
29
1
0
09 Nov 2024
From Attention to Activation: Unravelling the Enigmas of Large Language Models
Prannay Kaul
Chengcheng Ma
Ismail Elezi
Jiankang Deng
26
2
0
22 Oct 2024
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo
Druv Pai
Yu Bai
Jiantao Jiao
Michael I. Jordan
Song Mei
29
10
0
17 Oct 2024
Channel-Wise Mixed-Precision Quantization for Large Language Models
Zihan Chen
Bike Xie
Jundong Li
Cong Shen
MQ
29
2
0
16 Oct 2024
SLaNC: Static LayerNorm Calibration
Mahsa Salmani
Nikita Trukhanov
I. Soloveychik
MQ
28
0
0
14 Oct 2024
Frontiers of Deep Learning: From Novel Application to Real-World Deployment
Rui Xie
VLM
37
1
0
19 Jul 2024
RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
Xijie Huang
Zechun Liu
Shih-yang Liu
Kwang-Ting Cheng
MQ
37
7
0
10 Jul 2024
Timestep-Aware Correction for Quantized Diffusion Models
Yuzhe Yao
Feng Tian
Jun Chen
Haonan Lin
Guang Dai
Yong Liu
Jingdong Wang
DiffM
MQ
46
5
0
04 Jul 2024
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
Lei Chen
Yuan Meng
Chen Tang
Xinzhu Ma
Jingyan Jiang
Xin Wang
Zhi Wang
Wenwu Zhu
MQ
26
23
0
25 Jun 2024
Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization
Seungwoo Son
Wonpyo Park
Woohyun Han
Kyuyeun Kim
Jaeho Lee
MQ
32
10
0
17 Jun 2024
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
Dominik Wagner
Ilja Baumann
K. Riedhammer
Tobias Bocklet
MQ
30
1
0
16 Jun 2024
Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization
Jungi Lee
Wonbeom Lee
Jaewoong Sim
MQ
34
14
0
16 Jun 2024
Low-Rank Quantization-Aware Training for LLMs
Yelysei Bondarenko
Riccardo Del Chiaro
Markus Nagel
MQ
33
10
0
10 Jun 2024
Evaluating Zero-Shot Long-Context LLM Compression
Chenyu Wang
Yihan Wang
Kai Li
51
0
0
10 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
36
3
0
29 May 2024
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
Jaewoo Yang
Hayun Kim
Younghoon Kim
41
12
0
23 May 2024
From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of Deep Neural Networks
Xue Geng
Zhe Wang
Chunyun Chen
Qing Xu
Kaixin Xu
...
Zhenghua Chen
M. Aly
Jie Lin
Min-man Wu
Xiaoli Li
33
1
0
09 May 2024
Efficient Compression of Multitask Multilingual Speech Models
Thomas Palmeira Ferraz
41
0
0
02 May 2024
Quantization of Large Language Models with an Overdetermined Basis
D. Merkulov
Daria Cherniuk
Alexander Rudikov
Ivan V. Oseledets
Ekaterina A. Muravleva
A. Mikhalev
Boris Kashin
MQ
31
0
0
15 Apr 2024
Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
Jerry Yao-Chieh Hu
Pei-Hsuan Chang
Haozheng Luo
Hong-Yu Chen
Weijian Li
Wei-Po Wang
Han Liu
39
26
0
04 Apr 2024
Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models
Dennis Wu
Jerry Yao-Chieh Hu
Teng-Yun Hsiao
Han Liu
40
28
0
04 Apr 2024
Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models
Wanyun Cui
Qianle Wang
MQ
36
2
0
03 Apr 2024
Instance-Aware Group Quantization for Vision Transformers
Jaehyeon Moon
Dohyung Kim
Junyong Cheon
Bumsub Ham
MQ
ViT
29
7
0
01 Apr 2024
FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines
Jiaao He
Jidong Zhai
32
27
0
18 Mar 2024
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
Hyungjun Oh
Kihong Kim
Jaemin Kim
Sungkyun Kim
Junyeol Lee
Du-Seong Chang
Jiwon Seo
38
28
0
15 Mar 2024
GRITv2: Efficient and Light-weight Social Relation Recognition
Sagar Reddy
Neeraj Kasera
Avinash Thakur
ViT
25
0
0
11 Mar 2024
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning
Jiun-Man Chen
Yu-Hsuan Chao
Yu-Jie Wang
Ming-Der Shieh
Chih-Chung Hsu
Wei-Fen Lin
MQ
32
1
0
11 Mar 2024
FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization
Yi Zhang
Fei Yang
Shuang Peng
Fangyu Wang
Aimin Pan
MQ
24
2
0
28 Feb 2024
Massive Activations in Large Language Models
Mingjie Sun
Xinlei Chen
J. Zico Kolter
Zhuang Liu
71
68
0
27 Feb 2024
Is It a Free Lunch for Removing Outliers during Pretraining?
Baohao Liao
Christof Monz
MQ
29
1
0
19 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
41
48
0
15 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
37
28
0
05 Feb 2024
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Coleman Hooper
Sehoon Kim
Hiva Mohammadzadeh
Michael W. Mahoney
Y. Shao
Kurt Keutzer
A. Gholami
MQ
25
177
0
31 Jan 2024
LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices
Junchen Zhao
Yurun Song
Simeng Liu
Ian G. Harris
S. Jyothi
23
5
0
01 Dec 2023
Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization
Jangwhan Lee
Minsoo Kim
Seungcheol Baek
Seok Joong Hwang
Wonyong Sung
Jungwook Choi
MQ
13
17
0
09 Nov 2023
Exploring Post-Training Quantization of Protein Language Models
Shuang Peng
Fei Yang
Ning Sun
Sheng Chen
Yanfeng Jiang
Aimin Pan
MQ
27
0
0
30 Oct 2023
ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers
Zhewei Yao
Reza Yazdani Aminabadi
Stephen Youn
Xiaoxia Wu
Elton Zheng
Yuxiong He
MQ
19
1
0
26 Oct 2023
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
Shih-yang Liu
Zechun Liu
Xijie Huang
Pingcheng Dong
Kwang-Ting Cheng
MQ
19
56
0
25 Oct 2023
Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models
Miaoxi Zhu
Qihuang Zhong
Li Shen
Liang Ding
Juhua Liu
Bo Du
Dacheng Tao
MQ
VLM
29
1
0
20 Oct 2023
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
Jing Liu
Ruihao Gong
Xiuying Wei
Zhiwei Dong
Jianfei Cai
Bohan Zhuang
MQ
25
51
0
12 Oct 2023
Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?
Cheng Zhang
Jianyi Cheng
Ilia Shumailov
G. Constantinides
Yiren Zhao
MQ
19
9
0
08 Oct 2023
Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
Luoming Zhang
Wen Fei
Weijia Wu
Yefei He
Zhenyu Lou
Hong Zhou
MQ
19
5
0
07 Oct 2023
1
2
Next