Accurate Sublayer Pruning for Large Language Models by Exploiting Latency and Tunability Information

4 June 2025

Papers citing "Accurate Sublayer Pruning for Large Language Models by Exploiting Latency and Tunability Information"

2 / 2 papers shown

Title
Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models Seungcheol Park Jeongin Bae Beomseok Kwon Minjun Kim Byeongwook Kim S. Kwon U. Kang Dongsoo Lee MQ 122 0 0 04 Jun 2025
Zero-shot Quantization: A Comprehensive Survey Minjun Kim Jaehyeon Choi Jongkeun Lee Wonjin Cho U. Kang MQ 90 2 0 14 May 2025