Prefixing Attention Sinks can Mitigate Activation Outliers for Large
Language Model Quantization

Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization

17 June 2024

ArXiv (abs)PDF HTML

Papers citing "Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization"

14 / 14 papers shown

Title
Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs Lucas Maisonnave Cyril Moineau Olivier Bichler Fabrice Rastello MQ 115 0 0 18 Apr 2025
Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models Yanwen Huang Yong Zhang Ning Cheng Zhitao Li Shaojun Wang Jing Xiao 167 0 0 02 Jan 2025
House of Cards: Massive Weights in LLMs Jaehoon Oh Seungjun Shin Dokwan Oh 103 1 0 02 Oct 2024
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration Ji Lin Jiaming Tang Haotian Tang Shang Yang Wei-Ming Chen Wei-Chen Wang Guangxuan Xiao Xingyu Dang Chuang Gan Song Han EDL MQ 108 578 0 01 Jun 2023
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model BigScience Workshop : Teven Le Scao Angela Fan Christopher Akiki ... Zhongli Xie Zifan Ye M. Bras Younes Belkada Thomas Wolf VLM 417 2,393 0 09 Nov 2022
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale Tim Dettmers M. Lewis Younes Belkada Luke Zettlemoyer MQ 119 664 0 15 Aug 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers Z. Yao Reza Yazdani Aminabadi Minjia Zhang Xiaoxia Wu Conglong Li Yuxiong He VLM MQ 142 479 0 04 Jun 2022
All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality William Timkey Marten van Schijndel 301 116 0 09 Sep 2021
BERT Busters: Outlier Dimensions that Disrupt Transformers Olga Kovaleva Saurabh Kulshreshtha Anna Rogers Anna Rumshisky 101 92 0 14 May 2021
Prefix-Tuning: Optimizing Continuous Prompts for Generation Xiang Lisa Li Percy Liang 252 4,305 0 01 Jan 2021
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Colin Raffel Noam M. Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li Peter J. Liu AIMat 506 20,376 0 23 Oct 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Mohammad Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 343 1,920 0 17 Sep 2019
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference Benoit Jacob S. Kligys Bo Chen Menglong Zhu Matthew Tang Andrew G. Howard Hartwig Adam Dmitry Kalenichenko MQ 167 3,148 0 15 Dec 2017
Pointer Sentinel Mixture Models Stephen Merity Caiming Xiong James Bradbury R. Socher RALM 349 2,900 0 26 Sep 2016

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.