Effectively Compress KV Heads for LLM

Effectively Compress KV Heads for LLM

11 June 2024

Zelan Yang

Papers citing "Effectively Compress KV Heads for LLM"

4 / 4 papers shown

Title
Accurate KV Cache Quantization with Outlier Tokens Tracing Yi Su Yuechi Zhou Quantong Qiu Juntao Li Qingrong Xia Ping Li Xinyu Duan Zhefeng Wang Min Zhang MQ 22 0 0 16 May 2025
Cognitive Memory in Large Language Models Lianlei Shan Shixian Luo Zezhou Zhu Yu Yuan Yong Wu LLMAG KELM 232 1 0 03 Apr 2025
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Hanshi Sun Li-Wen Chang Yiyuan Ma Wenlei Bao Ningxin Zheng Xin Liu Harry Dong Yuejie Chi Beidi Chen VLM 88 16 0 28 Oct 2024
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 253 701 0 27 Aug 2021