ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.10438
  4. Cited By
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large
  Language Models

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

18 November 2022
Guangxuan Xiao
Ji Lin
Mickael Seznec
Hao Wu
Julien Demouth
Song Han
    MQ
ArXivPDFHTML

Papers citing "SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models"

50 / 533 papers shown
Title
ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large
  Language Models
ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models
Xiang Meng
Kayhan Behdin
Haoyue Wang
Rahul Mazumder
42
3
0
12 Jun 2024
OPTune: Efficient Online Preference Tuning
OPTune: Efficient Online Preference Tuning
Lichang Chen
Jiuhai Chen
Chenxi Liu
John Kirchenbauer
Davit Soselia
Chen Zhu
Tom Goldstein
Dinesh Manocha
Heng Huang
42
4
0
11 Jun 2024
When Linear Attention Meets Autoregressive Decoding: Towards More
  Effective and Efficient Linearized Large Language Models
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Haoran You
Yichao Fu
Zheng Wang
Amir Yazdanbakhsh
Yingyan Celine Lin
48
2
0
11 Jun 2024
TernaryLLM: Ternarized Large Language Model
TernaryLLM: Ternarized Large Language Model
Tianqi Chen
Zhe Li
Weixiang Xu
Zeyu Zhu
Dong Li
Lu Tian
E. Barsoum
Peisong Wang
Jian Cheng
36
7
0
11 Jun 2024
MoreauPruner: Robust Pruning of Large Language Models against Weight
  Perturbations
MoreauPruner: Robust Pruning of Large Language Models against Weight Perturbations
Zixiao Wang
Jingwei Zhang
Wenqian Zhao
Farzan Farnia
Bei Yu
AAML
37
3
0
11 Jun 2024
Low-Rank Quantization-Aware Training for LLMs
Low-Rank Quantization-Aware Training for LLMs
Yelysei Bondarenko
Riccardo Del Chiaro
Markus Nagel
MQ
33
10
0
10 Jun 2024
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training
  Multiplication-Less Reparameterization
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Haoran You
Yipin Guo
Yichao Fu
Wei Zhou
Huihong Shi
Xiaofan Zhang
Souvik Kundu
Amir Yazdanbakhsh
Y. Lin
KELM
56
7
0
10 Jun 2024
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated
  Parameters
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Yixin Song
Haotong Xie
Zhengyan Zhang
Bo Wen
Li Ma
Zeyu Mi
Haibo Chen
MoE
48
22
0
10 Jun 2024
Evaluating Zero-Shot Long-Context LLM Compression
Evaluating Zero-Shot Long-Context LLM Compression
Chenyu Wang
Yihan Wang
Kai Li
51
0
0
10 Jun 2024
Enabling Efficient Batch Serving for LMaaS via Generation Length
  Prediction
Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction
Ke Cheng
Wen Hu
Zhi Wang
Peng Du
Jianguo Li
Sheng Zhang
42
10
0
07 Jun 2024
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
Yang Sui
Yanyu Li
Anil Kag
Yerlan Idelbayev
Junli Cao
Ju Hu
Dhritiman Sagar
Bo Yuan
Sergey Tulyakov
Jian Ren
MQ
44
19
0
06 Jun 2024
Speculative Decoding via Early-exiting for Faster LLM Inference with
  Thompson Sampling Control Mechanism
Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism
Jiahao Liu
Qifan Wang
Jingang Wang
Xunliang Cai
32
7
0
06 Jun 2024
Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge
  Devices
Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices
Ruiyang Qin
Dancheng Liu
Zheyu Yan
Zhaoxuan Tan
Zixuan Pan
Zhenge Jia
Meng Jiang
Ahmed Abbasi
Jinjun Xiong
Yiyu Shi
64
13
0
06 Jun 2024
Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large
  Language Models
Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models
Peijie Dong
Lujun Li
Zhenheng Tang
Xiang Liu
Xinglin Pan
Qiang-qiang Wang
Xiaowen Chu
65
23
0
05 Jun 2024
Scalable MatMul-free Language Modeling
Scalable MatMul-free Language Modeling
Rui-Jie Zhu
Yu Zhang
Ethan Sifferman
Tyler Sheaves
Yiqiao Wang
Dustin Richmond
P. Zhou
Jason Eshraghian
31
17
0
04 Jun 2024
SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling
  for LLM
SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM
Quandong Wang
Yuxuan Yuan
Xiaoyu Yang
Ruike Zhang
Kang Zhao
Wei Liu
Jian Luan
Daniel Povey
Bin Wang
53
0
0
03 Jun 2024
MagR: Weight Magnitude Reduction for Enhancing Post-Training
  Quantization
MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization
Aozhong Zhang
Naigang Wang
Yanxia Deng
Xin Li
Zi Yang
Penghang Yin
MQ
42
5
0
02 Jun 2024
Empirical influence functions to understand the logic of fine-tuning
Empirical influence functions to understand the logic of fine-tuning
Jordan K Matelsky
Lyle Ungar
Konrad Paul Kording
26
0
0
01 Jun 2024
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Burcu Harma
Ayan Chakraborty
Elizaveta Kostenok
Danila Mishin
Dongho Ha
...
Martin Jaggi
Ming Liu
Yunho Oh
Suvinay Subramanian
Amir Yazdanbakhsh
MQ
44
6
0
31 May 2024
Outliers and Calibration Sets have Diminishing Effect on Quantization of
  Modern LLMs
Outliers and Calibration Sets have Diminishing Effect on Quantization of Modern LLMs
Davide Paglieri
Saurabh Dash
Tim Rocktaschel
Jack Parker-Holder
MQ
58
6
0
31 May 2024
LCQ: Low-Rank Codebook based Quantization for Large Language Models
LCQ: Low-Rank Codebook based Quantization for Large Language Models
Wen-Pu Cai
Wu-Jun Li
Wu-Jun Li
MQ
50
0
0
31 May 2024
One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient
  Deployments
One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient Deployments
Ke Yi
Yuhui Xu
Heng Chang
Chen Tang
Yuan Meng
Tong Zhang
Jia Li
MQ
38
2
0
30 May 2024
P$^2$-ViT: Power-of-Two Post-Training Quantization and Acceleration for
  Fully Quantized Vision Transformer
P2^22-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer
Huihong Shi
Xin Cheng
Wendong Mao
Zhongfeng Wang
MQ
48
3
0
30 May 2024
Understanding and Minimising Outlier Features in Neural Network Training
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
42
3
0
29 May 2024
Compressing Large Language Models using Low Rank and Low Precision
  Decomposition
Compressing Large Language Models using Low Rank and Low Precision Decomposition
R. Saha
Naomi Sagan
Varun Srivastava
Andrea J. Goldsmith
Mert Pilanci
MQ
30
10
0
29 May 2024
OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for
  Memory-Efficient LLM Fine-tuning
OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning
Pengxiang Li
Lu Yin
Xiaowei Gao
Shiwei Liu
36
7
0
28 May 2024
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language
  Models
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models
Yang Zhang
Yawei Li
Xinpeng Wang
Qianli Shen
Barbara Plank
Bernd Bischl
Mina Rezaei
Kenji Kawaguchi
63
9
0
28 May 2024
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit
  Large Language Models
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
Xing Hu
Yuan Cheng
Dawei Yang
Zhihang Yuan
Jiangyong Yu
Chen Xu
Sifan Zhou
MQ
40
8
0
28 May 2024
CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs
CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs
Haoyu Wang
Bei Liu
Hang Shao
Bo Xiao
Ke Zeng
Guanglu Wan
Yanmin Qian
MQ
31
0
0
27 May 2024
Extreme Compression of Adaptive Neural Images
Extreme Compression of Adaptive Neural Images
Leo Hoshikawa
Marcos V. Conde
Takeshi Ohashi
Atsushi Irie
48
1
0
27 May 2024
LoQT: Low Rank Adapters for Quantized Training
LoQT: Low Rank Adapters for Quantized Training
Sebastian Loeschcke
M. Toftrup
M. Kastoryano
Serge Belongie
Vésteinn Snæbjarnarson
MQ
42
0
0
26 May 2024
BOLD: Boolean Logic Deep Learning
BOLD: Boolean Logic Deep Learning
Van Minh Nguyen
Cristian Ocampo
Aymen Askri
Louis Leconte
Ba-Hien Tran
AI4CE
40
0
0
25 May 2024
PTQ4DiT: Post-training Quantization for Diffusion Transformers
PTQ4DiT: Post-training Quantization for Diffusion Transformers
Junyi Wu
Haoxuan Wang
Yuzhang Shang
Mubarak Shah
Yan Yan
MQ
38
19
0
25 May 2024
PatchProt: Hydrophobic patch prediction using protein foundation models
PatchProt: Hydrophobic patch prediction using protein foundation models
Dea Gogishvili
Emmanuel Minois-Genin
Jan van Eck
Sanne Abeln
26
2
0
24 May 2024
SCALM: Towards Semantic Caching for Automated Chat Services with Large
  Language Models
SCALM: Towards Semantic Caching for Automated Chat Services with Large Language Models
Jiaxing Li
Chi Xu
Feng Wang
Isaac M von Riedemann
Cong Zhang
Jiangchuan Liu
LLMAG
KELM
33
4
0
24 May 2024
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large
  Language Models
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
Wei Huang
Haotong Qin
Yangdong Liu
Yawei Li
Xianglong Liu
Luca Benini
Michele Magno
Xiaojuan Qi
MQ
67
15
0
23 May 2024
Embedding Compression for Efficient Re-Identification
Embedding Compression for Efficient Re-Identification
Luke McDermott
30
0
0
23 May 2024
Integer Scale: A Free Lunch for Faster Fine-grained Quantization of LLMs
Integer Scale: A Free Lunch for Faster Fine-grained Quantization of LLMs
Qingyuan Li
Ran Meng
Yiduo Li
Bo Zhang
Yifan Lu
Yerui Sun
Lin Ma
Yuchen Xie
MQ
38
0
0
23 May 2024
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based
  LLMs
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
Jaewoo Yang
Hayun Kim
Younghoon Kim
47
12
0
23 May 2024
EdgeShard: Efficient LLM Inference via Collaborative Edge Computing
EdgeShard: Efficient LLM Inference via Collaborative Edge Computing
Mingjin Zhang
Jiannong Cao
Xiaoming Shen
Zeyang Cui
47
48
0
23 May 2024
MiniCache: KV Cache Compression in Depth Dimension for Large Language
  Models
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
Akide Liu
Jing Liu
Zizheng Pan
Yefei He
Gholamreza Haffari
Bohan Zhuang
MQ
35
30
0
23 May 2024
ZipCache: Accurate and Efficient KV Cache Quantization with Salient
  Token Identification
ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
Yefei He
Luoming Zhang
Weijia Wu
Jing Liu
Hong Zhou
Bohan Zhuang
MQ
51
25
0
23 May 2024
TerDiT: Ternary Diffusion Models with Transformers
TerDiT: Ternary Diffusion Models with Transformers
Xudong Lu
Aojun Zhou
Ziyi Lin
Qi Liu
Yuhui Xu
Renrui Zhang
Yafei Wen
Shuai Ren
Peng Gao
Junchi Yan
MQ
55
2
0
23 May 2024
OAC: Output-adaptive Calibration for Accurate Post-training Quantization
OAC: Output-adaptive Calibration for Accurate Post-training Quantization
Ali Edalati
Alireza Ghaffari
M. Asgharian
Lu Hou
Boxing Chen
Vahid Partovi Nia
V. Nia
MQ
90
0
0
23 May 2024
eXmY: A Data Type and Technique for Arbitrary Bit Precision Quantization
eXmY: A Data Type and Technique for Arbitrary Bit Precision Quantization
Aditya Agrawal
Matthew Hedlund
Blake A. Hechtman
MQ
36
4
0
22 May 2024
AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization
  Method for LLMs
AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs
Alireza Ghaffari
Sharareh Younesian
Vahid Partovi Nia
Boxing Chen
M. Asgharian
MQ
55
0
0
22 May 2024
ReALLM: A general framework for LLM compression and fine-tuning
ReALLM: A general framework for LLM compression and fine-tuning
Louis Leconte
Lisa Bedin
Van Minh Nguyen
Eric Moulines
MQ
41
0
0
21 May 2024
Unlocking Data-free Low-bit Quantization with Matrix Decomposition for
  KV Cache Compression
Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression
Peiyu Liu
Zeming Gao
Wayne Xin Zhao
Yipeng Ma
Tao Wang
Ji-Rong Wen
MQ
37
5
0
21 May 2024
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language
  Models
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
Haojie Duanmu
Zhihang Yuan
Xiuhong Li
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQ
42
19
0
10 May 2024
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Yujun Lin
Haotian Tang
Shang Yang
Zhekai Zhang
Guangxuan Xiao
Chuang Gan
Song Han
90
77
0
07 May 2024
Previous
123...567...91011
Next