v1v2 (latest)

MetaGen Blended RAG: Unlocking Zero-Shot Precision for Specialized Domain Question-Answering

23 May 2025

Main:8 Pages

13 Figures

Bibliography:2 Pages

5 Tables

Appendix:4 Pages

Abstract

Retrieval-Augmented Generation (RAG) struggles with domain-specific enterprise datasets, often isolated behind firewalls and rich in complex, specialized terminology unseen by LLMs during pre-training. Semantic variability across domains like medicine, networking, or law hampers RAG's context precision, while fine-tuning solutions are costly, slow, and lack generalization as new data emerges. Achieving zero-shot precision with retrievers without fine-tuning still remains a key challenge. We introduce 'MetaGen Blended RAG', a novel enterprise search approach that enhances semantic retrievers through a metadata generation pipeline and hybrid query indexes using dense and sparse vectors. By leveraging key concepts, topics, and acronyms, our method creates metadata-enriched semantic indexes and boosted hybrid queries, delivering robust, scalable performance without fine-tuning. On the biomedical PubMedQA dataset, MetaGen Blended RAG achieves 82% retrieval accuracy and 77% RAG accuracy, surpassing all prior zero-shot RAG benchmarks and even rivaling fine-tuned models on that dataset, while also excelling on datasets like SQuAD and NQ. This approach redefines enterprise search using a new approach to building semantic retrievers with unmatched generalization across specialized domains.

View on arXiv

@article{sawarkar2025_2505.18247,
  title={ MetaGen Blended RAG: Unlocking Zero-Shot Precision for Specialized Domain Question-Answering },
  author={ Kunal Sawarkar and Shivam R. Solanki and Abhilasha Mangal },
  journal={arXiv preprint arXiv:2505.18247},
  year={ 2025 }
}

Comments on this paper