Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.07056
Cited By
Effectively Compress KV Heads for LLM
11 June 2024
Hao Yu
Zelan Yang
Shen Li
Yong Li
Jianxin Wu
MQ
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Effectively Compress KV Heads for LLM"
4 / 4 papers shown
Title
Accurate KV Cache Quantization with Outlier Tokens Tracing
Yi Su
Yuechi Zhou
Quantong Qiu
Juntao Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
MQ
22
0
0
16 May 2025
Cognitive Memory in Large Language Models
Lianlei Shan
Shixian Luo
Zezhou Zhu
Yu Yuan
Yong Wu
LLMAG
KELM
232
1
0
03 Apr 2025
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Hanshi Sun
Li-Wen Chang
Yiyuan Ma
Wenlei Bao
Ningxin Zheng
Xin Liu
Harry Dong
Yuejie Chi
Beidi Chen
VLM
88
16
0
28 Oct 2024
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
253
701
0
27 Aug 2021
1