Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.00456
Cited By
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
30 March 2024
Saleh Ashkboos
Amirkeivan Mohtashami
Maximilian L. Croci
Bo Li
Martin Jaggi
Dan Alistarh
Torsten Hoefler
James Hensman
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Github (390★)
Papers citing
"QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs"
50 / 59 papers shown
Title
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models
Tianchen Zhao
Ke Hong
Xinhao Yang
Xuefeng Xiao
Huixia Li
...
Ruiqi Xie
Siqi Chen
Hongyu Zhu
Y. Zhang
Yu Wang
MQ
VGen
24
0
0
19 Jun 2025
ROSAQ: Rotation-based Saliency-Aware Weight Quantization for Efficiently Compressing Large Language Models
Junho Yoon
Geom Lee
Donghyeon Jeon
Inho Kang
Seung-Hoon Na
MQ
VLM
41
0
0
16 Jun 2025
HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations
Marco Federici
Riccardo Del Chiaro
Boris van Breugel
Paul N. Whatmough
Markus Nagel
MQ
47
0
0
11 Jun 2025
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM Team
Chaojun Xiao
Yuxuan Li
Xu Han
Yuzhuo Bai
...
Zhiyuan Liu
Guoyang Zeng
Chao Jia
Dahai Li
Maosong Sun
MLLM
36
0
0
09 Jun 2025
MANBench: Is Your Multimodal Model Smarter than Human?
Han Zhou
Qitong Xu
Yiheng Dong
Xin Yang
19
0
0
04 Jun 2025
Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models
Seungcheol Park
Jeongin Bae
Beomseok Kwon
Minjun Kim
Byeongwook Kim
S. Kwon
U. Kang
Dongsoo Lee
MQ
147
0
0
04 Jun 2025
QuantFace: Low-Bit Post-Training Quantization for One-Step Diffusion Face Restoration
Jiatong Li
Libo Zhu
Haotong Qin
Jingkai Wang
Linghe Kong
Guihai Chen
Yulun Zhang
Xiaokang Yang
DiffM
MQ
50
0
0
01 Jun 2025
DenoiseRotator: Enhance Pruning Robustness for LLMs via Importance Concentration
Tianteng Gu
Bei Liu
Bo Xiao
Ke Zeng
Jiacheng Liu
Y. Qian
52
0
0
29 May 2025
Compressing Sine-Activated Low-Rank Adapters through Post-Training Quantization
Cameron Gordon
Yiping Ji
Hemanth Saratchandran
Paul Albert
Simon Lucey
MQ
63
0
0
28 May 2025
FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration
Daehyeon Baek
Jieun Choi
Jimyoung Son
Kyungmin Bin
Seungbeom Choi
Kihyo Moon
Minsung Jang
Hyojung Lee
MQ
25
0
0
27 May 2025
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression
Peijie Dong
Zhenheng Tang
Xiang Liu
Lujun Li
Xiaowen Chu
Bo Li
106
0
0
26 May 2025
LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning
Junyu Chen
Junzhuo Li
Zhen Peng
Wenjie Wang
Yuxiang Ren
Long Shi
Xuming Hu
MQ
33
0
0
24 May 2025
Why Do Some Inputs Break Low-Bit LLM Quantization?
Ting-Yun Chang
Muru Zhang
Jesse Thomason
Robin Jia
MQ
27
0
0
24 May 2025
BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook
Hao Gu
Lujun Li
Zheyu Wang
B. Liu
Qiyuan Zhu
Sirui Han
Yike Guo
MQ
23
0
0
24 May 2025
Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need?
Waleed Reda
Abhinav Jangda
Krishna Chintalapudi
118
0
0
23 May 2025
Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control
Seongmin Park
Hyungmin Kim
Sangwoo kim
Wonseok Jeon
Juyoung Yang
Byeongwook Jeon
Yoonseon Oh
Jungwook Choi
192
0
0
21 May 2025
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
Roberto L. Castro
Andrei Panferov
Soroush Tabesh
Oliver Sieberling
Jiale Chen
Mahdi Nikdan
Saleh Ashkboos
Dan Alistarh
MQ
113
0
0
20 May 2025
Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis
Hong Huang
Dapeng Wu
112
0
0
20 May 2025
Accurate KV Cache Quantization with Outlier Tokens Tracing
Yi Su
Yuechi Zhou
Quantong Qiu
Jilong Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
MQ
79
1
0
16 May 2025
Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural Networks
Chiyue Wei
Bowen Duan
Cong Guo
Jing Zhang
Qingyue Song
Hai "Helen" Li
Yiran Chen
115
0
0
16 May 2025
Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization
Shihao Zhang
Haoyu Zhang
Ian Colbert
Rayan Saab
MQ
101
0
0
16 May 2025
Analog Foundation Models
Julian Büchel
Iason Chalas
Giovanni Acampa
An Chen
Omobayode Fagbohungbe
Sidney Tsai
Kaoutar El Maghraoui
Manuel Le Gallo
Abbas Rahimi
Abu Sebastian
MQ
118
0
0
14 May 2025
GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance
Jinuk Kim
Marwa El Halabi
W. Park
Clemens JS Schaefer
Deokjae Lee
Yeonhong Park
Jae W. Lee
Hyun Oh Song
MQ
148
1
0
11 May 2025
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Haojie Duanmu
Xiuhong Li
Zhihang Yuan
Size Zheng
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQ
MoE
473
1
0
09 May 2025
Radio: Rate-Distortion Optimization for Large Language Model Compression
Sean I. Young
MQ
65
0
0
05 May 2025
TeleSparse: Practical Privacy-Preserving Verification of Deep Neural Networks
Mohammad Maheri
Hamed Haddadi
Alex Davidson
122
0
0
27 Apr 2025
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs
Hongyu Wang
Shuming Ma
Furu Wei
MQ
96
4
0
25 Apr 2025
Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs
Lucas Maisonnave
Cyril Moineau
Olivier Bichler
Fabrice Rastello
MQ
122
0
0
18 Apr 2025
Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving
Yaoyao Ding
Bohan Hou
Xinyu Zhang
Allan Lin
Tianqi Chen
Cody Yu Hao
Yida Wang
Gennady Pekhimenko
125
0
0
17 Apr 2025
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Yamato Arai
Yuma Ichikawa
MQ
107
0
0
13 Apr 2025
DL-QAT: Weight-Decomposed Low-Rank Quantization-Aware Training for Large Language Models
Wenjin Ke
Zhe Li
D. Li
Lu Tian
E. Barsoum
MQ
101
3
0
12 Apr 2025
Achieving binary weight and activation for LLMs using Post-Training Quantization
Siqing Song
Chuang Wang
Ruiqi Wang
Yi Yang
Xuyao Zhang
MQ
134
0
0
07 Apr 2025
GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration
Yuhang Li
Ruokai Yin
Donghyun Lee
Shiting Xiao
Priyadarshini Panda
MQ
126
0
0
03 Apr 2025
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Hung-Yueh Chiang
Chi-chih Chang
N. Frumkin
Kai-Chiang Wu
Mohamed S. Abdelfattah
Diana Marculescu
MQ
499
0
0
28 Mar 2025
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
Minsu Kim
Seongmin Hong
RyeoWook Ko
S. Choi
Hunjong Lee
Junsoo Kim
Joo-Young Kim
Jongse Park
122
0
0
24 Mar 2025
Accurate INT8 Training Through Dynamic Block-Level Fallback
Pengle Zhang
Jia Wei
Jintao Zhang
Jun-Jie Zhu
Jianfei Chen
MQ
173
9
0
11 Mar 2025
LightMamba: Efficient Mamba Acceleration on FPGA with Quantization and Hardware Co-design
Renjie Wei
Songqiang Xu
Linfeng Zhong
Zebin Yang
Qingyu Guo
Yidan Wang
Runsheng Wang
Meng Li
150
1
0
24 Feb 2025
SpinQuant: LLM quantization with learned rotations
Zechun Liu
Changsheng Zhao
Igor Fedorov
Bilge Soran
Dhruv Choudhary
Raghuraman Krishnamoorthi
Vikas Chandra
Yuandong Tian
Tijmen Blankevoort
MQ
265
126
0
21 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Jiaqi Zhao
Ming Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
252
2
0
18 Feb 2025
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
Sifan Zhou
Shuo Wang
Zhihang Yuan
Mingjia Shi
Yuzhang Shang
Dawei Yang
MQ
ALM
208
0
0
18 Feb 2025
NestQuant: Nested Lattice Quantization for Matrix Products and LLMs
Semyon Savkin
Eitan Porat
Or Ordentlich
Yury Polyanskiy
MQ
117
1
0
13 Feb 2025
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation
H. Seo
Wongi Jeong
Jae-sun Seo
Se Young Chun
143
0
0
12 Feb 2025
EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models
Xingrun Xing
Zheng Liu
Shitao Xiao
Boyan Gao
Yiming Liang
Wanpeng Zhang
Haokun Lin
Guoqi Li
Jiajun Zhang
LRM
276
2
0
10 Feb 2025
Optimizing Large Language Model Training Using FP4 Quantization
Ruizhe Wang
Yeyun Gong
Xiao Liu
Guoshuai Zhao
Ziyue Yang
Baining Guo
Zhengjun Zha
Peng Cheng
MQ
201
12
0
28 Jan 2025
OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting
Xing Hu
Yuan Cheng
Dawei Yang
Zukang Xu
Zhihang Yuan
Jiangyong Yu
Chen Xu
Zhe Jiang
Sifan Zhou
MQ
107
15
0
23 Jan 2025
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Han Guo
William Brandon
Radostin Cholakov
Jonathan Ragan-Kelley
Eric P. Xing
Yoon Kim
MQ
169
16
0
20 Jan 2025
BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Yuzong Chen
Ahmed F. AbouElhamayed
Xilai Dai
Yang Wang
Marta Andronic
George A. Constantinides
Mohamed S. Abdelfattah
MQ
149
2
0
18 Nov 2024
AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference
Janghwan Lee
Jiwoong Park
Jinseok Kim
Yongjik Kim
Jungju Oh
Jinwook Oh
Jungwook Choi
80
2
0
15 Nov 2024
The Super Weight in Large Language Models
Mengxia Yu
De Wang
Qi Shan
Colorado Reed
Alvin Wan
MQ
MILM
88
13
0
11 Nov 2024
QSpec: Speculative Decoding with Complementary Quantization Schemes
Juntao Zhao
Wenhao Lu
Sheng Wang
Lingpeng Kong
Chuan Wu
MQ
141
7
0
15 Oct 2024
1
2
Next