SDQ: Sparse Decomposed Quantization for LLM Inference

19 June 2024

Tushar Krishna

Papers citing "SDQ: Sparse Decomposed Quantization for LLM Inference"

2 / 2 papers shown

Title
Semantic Retention and Extreme Compression in LLMs: Can We Have Both? Stanislas Laborde Martin Cousseau Antoun Yaacoub Lionel Prevost MQ 23 0 0 12 May 2025
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks Torsten Hoefler Dan Alistarh Tal Ben-Nun Nikoli Dryden Alexandra Peste MQ 141 684 0 31 Jan 2021