QAQ: Quality Adaptive Quantization for LLM KV Cache

7 March 2024

Papers citing "QAQ: Quality Adaptive Quantization for LLM KV Cache"

9 / 9 papers shown

Title
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate A. Zandieh Majid Daliri Majid Hadian Vahab Mirrokni MQ 74 0 0 28 Apr 2025
GPU-Accelerated Motion Planning of an Underactuated Forestry Crane in Cluttered Environments M. Vu Gerald Ebmer Alexander Watcher Marc-Philip Ecker Giang Nguyen Tobias Glueck 74 2 0 18 Mar 2025
More for Keys, Less for Values: Adaptive KV Cache Quantization Mohsen Hariri Lam Nguyen Sixu Chen Shaochen Zhong Qifan Wang Xia Hu Xiaotian Han V. Chaudhary MQ 48 0 0 24 Feb 2025
iServe: An Intent-based Serving System for LLMs Dimitrios Liakopoulos Tianrui Hu Prasoon Sinha N. Yadwadkar VLM 179 0 0 08 Jan 2025
An Evolved Universal Transformer Memory Edoardo Cetin Qi Sun Tianyu Zhao Yujin Tang 149 0 0 17 Oct 2024
QSpec: Speculative Decoding with Complementary Quantization Schemes Juntao Zhao Wenhao Lu Sheng Wang Lingpeng Kong Chuan Wu MQ 71 5 0 15 Oct 2024
Model Agnostic Hybrid Sharding For Heterogeneous Distributed Inference Claudio Angione Yue Zhao Harry Yang Ahmad Farhan Fielding Johnston James Buban Patrick Colangelo 42 1 0 29 Jul 2024
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches Jiayi Yuan Hongyi Liu Shaochen Zhong Yu-Neng Chuang ... Hongye Jin V. Chaudhary Zhaozhuo Xu Zirui Liu Xia Hu 43 17 0 01 Jul 2024
Efficient LLM Inference with Kcache Qiaozhi He Zhihua Wu RALM 30 1 0 28 Apr 2024