7
0

RAGXplain: From Explainable Evaluation to Actionable Guidance of RAG Pipelines

Abstract

Retrieval-Augmented Generation (RAG) systems show promise by coupling large language models with external knowledge, yet traditional RAG evaluation methods primarily report quantitative scores while offering limited actionable guidance for refining these complex pipelines. In this paper, we introduce RAGXplain, an evaluation framework that quantifies RAG performance and translates these assessments into clear insights that clarify the workings of its complex, multi-stage pipeline and offer actionable recommendations. Using LLM reasoning, RAGXplain converts raw scores into coherent narratives identifying performance gaps and suggesting targeted improvements. By providing transparent explanations for AI decision-making, our framework fosters user trust-a key challenge in AI adoption. Our LLM-based metric assessments show strong alignment with human judgments, and experiments on public question-answering datasets confirm that applying RAGXplain's actionable recommendations measurably improves system performance. RAGXplain thus bridges quantitative evaluation and practical optimization, empowering users to understand, trust, and enhance their AI systems.

View on arXiv
@article{cohen2025_2505.13538,
  title={ RAGXplain: From Explainable Evaluation to Actionable Guidance of RAG Pipelines },
  author={ Dvir Cohen and Lin Burg and Gilad Barkan },
  journal={arXiv preprint arXiv:2505.13538},
  year={ 2025 }
}
Comments on this paper