NACL: A General and Effective KV Cache Eviction Framework for LLMs at
Inference Time

v1v2 (latest)

NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time

7 August 2024

Zhenyu Zhang

Yu Sun

ArXiv (abs)PDF HTML Github (1737★)

Papers citing "NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time"

6 / 6 papers shown

Title
Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs? Adithya Bhaskar Alexander Wettig Tianyu Gao Yihe Dong Danqi Chen 15 0 0 20 Jun 2025
LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning Haoyue Zhang Hualei Zhang Xiaosong Ma Jie Zhang Song Guo LRM 17 0 0 19 Jun 2025
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions Yiming Du Wenyu Huang Danna Zheng Zhaowei Wang Sébastien Montella Mirella Lapata Kam-Fai Wong Jeff Z. Pan KELM MU 235 5 0 01 May 2025
In-context KV-Cache Eviction for LLMs via Attention-Gate Zihao Zeng Bokai Lin Tianqi Hou Hao Zhang Zhijie Deng 123 2 0 15 Oct 2024
FlashMask: Efficient and Rich Mask Extension of FlashAttention Guoxia Wang Jinle Zeng Xiyuan Xiao Siming Wu Jiabin Yang Lujing Zheng Zeyu Chen Jiang Bian Dianhai Yu Haifeng Wang 386 3 0 02 Oct 2024
Fast Transformer Decoding: One Write-Head is All You Need Noam M. Shazeer 168 478 0 06 Nov 2019