Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.10438
Cited By
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
18 November 2022
Guangxuan Xiao
Ji Lin
Mickael Seznec
Hao Wu
Julien Demouth
Song Han
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models"
50 / 533 papers shown
Title
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
Abdelrahman M. Shaker
Muhammad Maaz
Chenhui Gou
Hamid Rezatofighi
Salman Khan
Fahad Shahbaz Khan
189
0
0
27 Mar 2025
HOT: Hadamard-based Optimized Training
Seonggon Kim
Juncheol Shin
Seung-taek Woo
Eunhyeok Park
48
0
0
27 Mar 2025
QUAD: Quantization and Parameter-Efficient Tuning of LLM with Activation Decomposition
Yuxuan Hu
Xiaodong Chen
C. Li
Hongyu Chen
J. Zhang
MQ
60
0
0
25 Mar 2025
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Yang Sui
Yu-Neng Chuang
Guanchu Wang
Jiamu Zhang
Tianyi Zhang
...
Hongyi Liu
Andrew Wen
Shaochen
Zhong
Hanjie Chen
OffRL
ReLM
LRM
83
31
0
20 Mar 2025
XAttention: Block Sparse Attention with Antidiagonal Scoring
Ruyi Xu
Guangxuan Xiao
Haofeng Huang
Junxian Guo
Enze Xie
74
4
0
20 Mar 2025
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Keda Tao
Haoxuan You
Yang Sui
Can Qin
Haoyu Wang
VLM
MQ
91
0
0
20 Mar 2025
FP4DiT: Towards Effective Floating Point Quantization for Diffusion Transformers
Ruichen Chen
Keith G. Mills
Di Niu
MQ
59
0
0
19 Mar 2025
PIPO: Pipelined Offloading for Efficient Inference on Consumer Devices
Yangyijian Liu
Jun Yu Li
Wu-Jun Li
36
0
0
15 Mar 2025
Towards Extreme Pruning of LLMs with Plug-and-Play Mixed Sparsity
Chi Xu
Gefei Zhang
Yantong Zhu
Luca Benini
Guosheng Hu
Yawei Li
Zhihong Zhang
34
0
0
14 Mar 2025
OuroMamba: A Data-Free Quantization Framework for Vision Mamba Models
Akshat Ramachandran
Mingyu Lee
Huan Xu
Souvik Kundu
Tushar Krishna
MQ
51
1
0
13 Mar 2025
Accurate INT8 Training Through Dynamic Block-Level Fallback
Pengle Zhang
Jia wei
Jintao Zhang
Jun-Jie Zhu
Jianfei Chen
MQ
82
3
0
13 Mar 2025
ZSMerge: Zero-Shot KV Cache Compression for Memory-Efficient Long-Context LLMs
Xin Liu
Pei Liu
Guoming Tang
MoMe
54
0
0
13 Mar 2025
ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng
Shuaiting Li
Zeyu Wang
Kedong Xu
Hong Gu
Kejie Huang
MQ
60
0
0
12 Mar 2025
Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference
Pol G. Recasens
Ferran Agullo
Yue Zhu
Chen Wang
Eun Kyung Lee
Olivier Tardieu
Jordi Torres
Josep Ll. Berral
48
0
0
11 Mar 2025
Post-Training Quantization for Diffusion Transformer via Hierarchical Timestep Grouping
Ning Ding
Jing Han
Yuchuan Tian
Chao Xu
Kai Han
Yehui Tang
MQ
44
0
0
10 Mar 2025
SAQ-SAM: Semantically-Aligned Quantization for Segment Anything Model
Jing Zhang
Zhiyu Li
Qingyi Gu
MQ
VLM
56
0
0
09 Mar 2025
Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation
Yingfeng Luo
Tong Zheng
Yongyu Mu
Yangqiu Song
Qinghong Zhang
...
Ziqiang Xu
Peinan Feng
Xiaoqian Liu
Tong Xiao
Jingbo Zhu
AI4CE
218
0
0
09 Mar 2025
TR-DQ: Time-Rotation Diffusion Quantization
Yihua Shao
Deyang Lin
Fanhu Zeng
Minxi Yan
Hao Fei
...
Haozhe Wang
Jiaxin Guo
Yan Wang
Haotong Qin
Hao Tang
MQ
DiffM
77
1
0
09 Mar 2025
MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration
Jinguang Wang
Yufei Guo
Haifeng Sun
Tingting Yang
Zirui Zhuang
Wanyi Ning
Yuexi Yin
Q. Qi
Jianxin Liao
MQ
MoMe
51
0
0
07 Mar 2025
Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size
Alireza Behtash
Marijan Fofonjka
Ethan Baird
Tyler Mauer
Hossein Moghimifam
David Stout
Joel Dennison
MQ
66
1
0
06 Mar 2025
Knowledge-Decoupled Synergetic Learning: An MLLM based Collaborative Approach to Few-shot Multimodal Dialogue Intention Recognition
Bin Chen
Yu Zhang
Hongfei Ye
Ziyi Huang
Hongyang Chen
65
1
0
06 Mar 2025
TeTRA-VPR: A Ternary Transformer Approach for Compact Visual Place Recognition
Oliver Grainge
Michael Milford
I. Bodala
Sarvapali D. Ramchurn
Shoaib Ehsan
ViT
72
0
0
04 Mar 2025
Identifying Sensitive Weights via Post-quantization Integral
Yuezhou Hu
Weiyu Huang
Zichen Liang
Chong Chen
Jintao Zhang
Jun Zhu
Jianfei Chen
MQ
47
2
0
28 Feb 2025
HALO: Hardware-aware quantization with low critical-path-delay weights for LLM acceleration
Rohan Juneja
Shivam Aggarwal
Safeen Huda
Tulika Mitra
L. Peh
50
0
0
27 Feb 2025
Binary Neural Networks for Large Language Model: A Survey
Liangdong Liu
Zhitong Zheng
Cong Wang
TianHuang Su
ZhenYu Yang
MQ
67
0
0
26 Feb 2025
LightMamba: Efficient Mamba Acceleration on FPGA with Quantization and Hardware Co-design
Renjie Wei
Songqiang Xu
Linfeng Zhong
Zebin Yang
Qingyu Guo
Yidan Wang
Runsheng Wang
Meng Li
84
0
0
24 Feb 2025
KVCrush: Key value cache size-reduction using similarity in head-behaviour
Gopi Krishna Jha
Sameh Gobriel
Liubov Talamanova
Alexander Kozlov
Nilesh Jain
MQ
39
0
0
24 Feb 2025
SpinQuant: LLM quantization with learned rotations
Zechun Liu
Changsheng Zhao
Igor Fedorov
Bilge Soran
Dhruv Choudhary
Raghuraman Krishnamoorthi
Vikas Chandra
Yuandong Tian
Tijmen Blankevoort
MQ
137
85
0
21 Feb 2025
Dynamic Low-Rank Sparse Adaptation for Large Language Models
Weizhong Huang
Yuxin Zhang
Xiawu Zheng
Yong-Jin Liu
Jing Lin
Yiwu Yao
Rongrong Ji
97
1
0
21 Feb 2025
Hardware-Friendly Static Quantization Method for Video Diffusion Transformers
Sanghyun Yi
Qingfeng Liu
Mostafa El-Khamy
MQ
VGen
41
0
0
20 Feb 2025
EvoP: Robust LLM Inference via Evolutionary Pruning
Shangyu Wu
Hongchao Du
Ying Xiong
Shuai Chen
Tei-Wei Kuo
Nan Guan
Chun Jason Xue
34
1
0
19 Feb 2025
Efficient Long-Decoding Inference with Reasoning-Aware Attention Sparsity
Junhao Hu
Wenrui Huang
Weidong Wang
Zhenwen Li
Tiancheng Hu
Zhixia Liu
Xusheng Chen
Tao Xie
Yizhou Shan
LRM
51
0
0
16 Feb 2025
Vertical Federated Learning in Practice: The Good, the Bad, and the Ugly
Zhaomin Wu
Zhen Qin
Junyi Hou
Haodong Zhao
Qinbin Li
Bingsheng He
Lixin Fan
FedML
76
2
0
12 Feb 2025
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation
H. Seo
Wongi Jeong
Jae-sun Seo
Se Young Chun
62
0
0
12 Feb 2025
EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models
Xingrun Xing
Zheng Liu
Shitao Xiao
Boyan Gao
Yiming Liang
Wanpeng Zhang
Haokun Lin
Guoqi Li
Jiajun Zhang
LRM
64
1
0
10 Feb 2025
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization
Zechun Liu
Changsheng Zhao
Hanxian Huang
Sijia Chen
Jing Zhang
...
Yuandong Tian
Bilge Soran
Raghuraman Krishnamoorthi
Tijmen Blankevoort
Vikas Chandra
MQ
81
3
0
04 Feb 2025
Symmetric Pruning of Large Language Models
Kai Yi
Peter Richtárik
AAML
VLM
73
0
0
31 Jan 2025
Optimizing Large Language Model Training Using FP4 Quantization
Ruizhe Wang
Yeyun Gong
Xiao Liu
Guoshuai Zhao
Ziyue Yang
Baining Guo
Zhengjun Zha
Peng Cheng
MQ
67
5
0
28 Jan 2025
GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments
Yanyu Chen
Ganhong Huang
108
0
0
28 Jan 2025
HadamRNN: Binary and Sparse Ternary Orthogonal RNNs
Armand Foucault
Franck Mamalet
François Malgouyres
MQ
85
0
0
28 Jan 2025
PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization
Yonghong Tian
Yi Liu
Jiahao Wang
Yi Bin
Wenqi Shao
Ping Luo
MQ
66
3
0
28 Jan 2025
OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting
Xing Hu
Yuan Cheng
Dawei Yang
Zukang Xu
Zhihang Yuan
Jiangyong Yu
Chen Xu
Zhe Jiang
Sifan Zhou
MQ
44
6
0
23 Jan 2025
Irrational Complex Rotations Empower Low-bit Optimizers
Zhen Tian
Wayne Xin Zhao
Zhicheng Dou
MQ
46
0
0
22 Jan 2025
DriveLM: Driving with Graph Visual Question Answering
Chonghao Sima
Katrin Renz
Kashyap Chitta
L. Chen
Hanxue Zhang
Chengen Xie
Jens Beißwenger
Ping Luo
Andreas Geiger
Hongyang Li
108
164
0
17 Jan 2025
Rethinking Post-Training Quantization: Introducing a Statistical Pre-Calibration Approach
Alireza Ghaffari
Sharareh Younesian
Boxing Chen
Vahid Partovi Nia
M. Asgharian
MQ
63
0
0
17 Jan 2025
HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location
Ting Sun
Penghan Wang
Fan Lai
196
1
0
15 Jan 2025
FlexQuant: Elastic Quantization Framework for Locally Hosted LLM on Edge Devices
Yuji Chai
Mujin Kwen
David Brooks
Gu-Yeon Wei
MQ
44
3
0
13 Jan 2025
DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy Theory
Jerry Chee
A. Backurs
Rainie Heck
Li Zhang
Janardhan Kulkarni
Thomas Rothvoss
Sivakanth Gopi
MQ
54
0
0
11 Jan 2025
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
96
9
0
11 Jan 2025
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
223
0
0
08 Jan 2025
Previous
1
2
3
4
5
...
9
10
11
Next