ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.16525
57
0
v1v2 (latest)

KVShare: Semantic-Aware Key-Value Cache Sharing for Efficient Large Language Model Inference

17 March 2025
Huan Yang
Renji Zhang
Mingzhe Huang
ArXiv (abs)PDFHTML
Main:9 Pages
12 Figures
Bibliography:2 Pages
1 Tables
Appendix:11 Pages
Abstract

This paper presents KVShare, a multi-user Key-Value (KV) Cache sharing technology based on semantic similarity, designed to enhance the inference efficiency of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs). Addressing the limitations of existing prefix caching (strict text prefix matching) and semantic caching (loss of response diversity), KVShare achieves fine-grained KV cache reuse through semantic alignment algorithms and differential editing operations. Experiments on real-world user conversation datasets demonstrate that KVShare improves KV cache hit rates by over 60%, while maintaining output quality comparable to full computation (no significant degradation in BLEU and Rouge-L metrics). This approach effectively reduces GPU resource consumption and is applicable to scenarios with repetitive queries, such as healthcare and education.

View on arXiv
@article{yang2025_2503.16525,
  title={ KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse },
  author={ Huan Yang and Renji Zhang and Mingzhe Huang and Weijun Wang and Yin Tang and Yuanchun Li and Yunxin Liu and Deyu Zhang },
  journal={arXiv preprint arXiv:2503.16525},
  year={ 2025 }
}
Comments on this paper