ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.02634
73
1
v1v2v3 (latest)

KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider

3 June 2025
Jiahao Wang
Jinbo Han
Xingda Wei
Sijie Shen
Dingyan Zhang
Chenguang Fang
Rong Chen
Wenyuan Yu
Haibo Chen
ArXiv (abs)PDFHTML
Main:15 Pages
31 Figures
Bibliography:1 Pages
2 Tables
Appendix:2 Pages
Abstract

Serving large language models (LLMs) is important for cloud providers, and caching intermediate results (KV\) after processing each request substantially improves serving throughput and latency. However, there is limited understanding of how LLM serving benefits from KV\ caching, where system design decisions like cache eviction policies are highly workload-dependent. In this paper, we present the first systematic characterization of the KV\ workload patterns from one of the leading LLM service providers. We draw observations that were not covered by previous studies focusing on synthetic workloads, including: KV\ reuses are skewed across requests, where reuses between single-turn requests are equally important as multi-turn requests; the reuse time and probability are diverse considering all requests, but for a specific request category, the pattern tends to be predictable; and the overall cache size required for an ideal cache hit ratio is moderate. Based on the characterization, we further propose a workload-aware cache eviction policy that improves the serving performance under real-world traces, especially with limited cache capacity.

View on arXiv
@article{wang2025_2506.02634,
  title={ KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider },
  author={ Jiahao Wang and Jinbo Han and Xingda Wei and Sijie Shen and Dingyan Zhang and Chenguang Fang and Rong Chen and Wenyuan Yu and Haibo Chen },
  journal={arXiv preprint arXiv:2506.02634},
  year={ 2025 }
}
Comments on this paper