ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.14903
32
0

ColBERT-serve: Efficient Multi-Stage Memory-Mapped Scoring

21 April 2025
Kaili Huang
Thejas Venkatesh
Uma Dingankar
Antonio Mallia
Daniel Campos
Jian Jiao
Christopher Potts
Matei A. Zaharia
Kwabena Boahen
Omar Khattab
Saarthak Sarup
Keshav Santhanam
ArXivPDFHTML
Abstract

We study serving retrieval models, specifically late interaction models like ColBERT, to many concurrent users at once and under a small budget, in which the index may not fit in memory. We present ColBERT-serve, a novel serving system that applies a memory-mapping strategy to the ColBERT index, reducing RAM usage by 90% and permitting its deployment on cheap servers, and incorporates a multi-stage architecture with hybrid scoring, reducing ColBERT's query latency and supporting many concurrent queries in parallel.

View on arXiv
@article{huang2025_2504.14903,
  title={ ColBERT-serve: Efficient Multi-Stage Memory-Mapped Scoring },
  author={ Kaili Huang and Thejas Venkatesh and Uma Dingankar and Antonio Mallia and Daniel Campos and Jian Jiao and Christopher Potts and Matei Zaharia and Kwabena Boahen and Omar Khattab and Saarthak Sarup and Keshav Santhanam },
  journal={arXiv preprint arXiv:2504.14903},
  year={ 2025 }
}
Comments on this paper