Intriguing Properties of Quantization at Scale

Intriguing Properties of Quantization at Scale

30 May 2023

Bharat Venkitesh

Papers citing "Intriguing Properties of Quantization at Scale"

17 / 17 papers shown

Title
Fast and Low-Cost Genomic Foundation Models via Outlier Removal Haozheng Luo Chenghao Qiu Maojiang Su Zhihan Zhou Zoe Mehta Guo Ye Jerry Yao-Chieh Hu Han Liu AAML 55 1 0 01 May 2025
Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models Lucas Maisonnave Cyril Moineau Olivier Bichler Fabrice Rastello MQ 71 1 0 30 Apr 2025
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization Minsu Kim Seongmin Hong RyeoWook Ko S. Choi Hunjong Lee Junsoo Kim Joo-Young Kim Jongse Park 57 0 0 24 Mar 2025
Are formal and functional linguistic mechanisms dissociated in language models? Michael Hanna Sandro Pezzelle Yonatan Belinkov 50 0 0 14 Mar 2025
$u-$\mu$P: The Unit-Scaled Maximal Update Parametrization$ u- $\mu$ P: The Unit-Scaled Maximal Update Parametrization Charlie Blake C. Eichenberg Josef Dean Lukas Balles Luke Y. Prince Bjorn Deiseroth Andres Felipe Cruz Salinas Carlo Luschi Samuel Weinbach Douglas Orr 58 9 0 24 Jul 2024
How Does Quantization Affect Multilingual LLMs? Kelly Marchisio Saurabh Dash Hongyu Chen Dennis Aumiller Ahmet Üstün Sara Hooker Sebastian Ruder MQ 52 8 0 03 Jul 2024
NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models Amit Dhurandhar Tejaswini Pedapati Ronny Luss Soham Dan Aurélie C. Lozano Payel Das Georgios Kollias 22 3 0 28 Feb 2024
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models Luiza Amador Pozzobon Beyza Ermis Patrick Lewis Sara Hooker 36 20 0 11 Oct 2023
A Simple and Effective Pruning Approach for Large Language Models Mingjie Sun Zhuang Liu Anna Bair J. Zico Kolter 62 359 0 20 Jun 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU Ying Sheng Lianmin Zheng Binhang Yuan Zhuohan Li Max Ryabinin ... Joseph E. Gonzalez Percy Liang Christopher Ré Ion Stoica Ce Zhang 149 369 0 13 Mar 2023
GLM-130B: An Open Bilingual Pre-trained Model Aohan Zeng Xiao Liu Zhengxiao Du Zihan Wang Hanyu Lai ... Jidong Zhai Wenguang Chen Peng-Zhen Zhang Yuxiao Dong Jie Tang BDL LRM 253 1,073 0 05 Oct 2022
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency Giovanni Puccetti Anna Rogers Aleksandr Drozd F. Dell’Orletta 79 42 0 23 May 2022
The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation Orevaoghene Ahia Julia Kreutzer Sara Hooker 118 51 0 06 Oct 2021
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 253 698 0 27 Aug 2021
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 264 4,489 0 23 Jan 2020
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand M. Andreetto Hartwig Adam 3DH 950 20,572 0 17 Apr 2017
Improving neural networks by preventing co-adaptation of feature detectors Geoffrey E. Hinton Nitish Srivastava A. Krizhevsky Ilya Sutskever Ruslan Salakhutdinov VLM 266 7,638 0 03 Jul 2012