Effective Interplay between Sparsity and Quantization: From Theory to Practice

Effective Interplay between Sparsity and Quantization: From Theory to Practice

31 May 2024

Simla Burcu Harma

Ayan Chakraborty

Elizaveta Kostenok

Suvinay Subramanian

Amir Yazdanbakhsh

Papers citing "Effective Interplay between Sparsity and Quantization: From Theory to Practice"

10 / 10 papers shown

Title
Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression Xiaoyi Qu David Aponte Colby R. Banbury Daniel P. Robinson Tianyu Ding K. Koishida Ilya Zharkov Tianyi Chen MQ 65 1 0 23 Feb 2025
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models Cong Guo Feng Cheng Zhixu Du James Kiessling Jonathan Ku ... Qilin Zheng Guanglei Zhou Hai Li-Wei Li Yiran Chen 31 7 0 08 Oct 2024
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models Haoran You Yichao Fu Zheng Wang Amir Yazdanbakhsh Yingyan Celine Lin 31 2 0 11 Jun 2024
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization Haoran You Yipin Guo Yichao Fu Wei Zhou Huihong Shi Xiaofan Zhang Souvik Kundu Amir Yazdanbakhsh Y. Lin KELM 48 7 0 10 Jun 2024
Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers A. Bambhaniya Amir Yazdanbakhsh Suvinay Subramanian Sheng-Chun Kao Shivani Agrawal Utku Evci Tushar Krishna 54 16 0 07 Feb 2024
The Falcon Series of Open Language Models Ebtesam Almazrouei Hamza Alobeidli Abdulaziz Alshamsi Alessandro Cappelli Ruxandra-Aimée Cojocaru ... Quentin Malartic Daniele Mazzotta Badreddine Noune B. Pannier Guilherme Penedo AI4TS ALM 118 399 0 28 Nov 2023
Dynamic Sparse Training with Structured Sparsity Mike Lasby A. Golubeva Utku Evci Mihai Nica Yani Andrew Ioannou 29 19 0 03 May 2023
Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask Sheng-Chun Kao Amir Yazdanbakhsh Suvinay Subramanian Shivani Agrawal Utku Evci T. Krishna 48 12 0 15 Sep 2022
OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization Peng Hu Xi Peng Hongyuan Zhu M. Aly Jie Lin MQ 39 59 0 23 May 2022
Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks Itay Hubara Brian Chmiel Moshe Island Ron Banner S. Naor Daniel Soudry 50 110 0 16 Feb 2021