ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.11271
12
0

Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models

16 May 2025
Camille Couturier
Spyros Mastorakis
Haiying Shen
Saravan Rajmohan
Victor Rühle
    KELM
ArXivPDFHTML
Abstract

Large Language Models (LLMs) are increasingly deployed across edge and cloud platforms for real-time question-answering and retrieval-augmented generation. However, processing lengthy contexts in distributed systems incurs high computational overhead, memory usage, and network bandwidth. This paper introduces a novel semantic caching approach for storing and reusing intermediate contextual summaries, enabling efficient information reuse across similar queries in LLM-based QA workflows. Our method reduces redundant computations by up to 50-60% while maintaining answer accuracy comparable to full document processing, as demonstrated on NaturalQuestions, TriviaQA, and a synthetic ArXiv dataset. This approach balances computational cost and response quality, critical for real-time AI assistants.

View on arXiv
@article{couturier2025_2505.11271,
  title={ Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models },
  author={ Camille Couturier and Spyros Mastorakis and Haiying Shen and Saravan Rajmohan and Victor Rühle },
  journal={arXiv preprint arXiv:2505.11271},
  year={ 2025 }
}
Comments on this paper