ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.21870
14
0

Evaluating the Retrieval Robustness of Large Language Models

28 May 2025
Shuyang Cao
Karthik Radhakrishnan
David S. Rosenberg
Steven Lu
Pengxiang Cheng
Lu Wang
Shiyue Zhang
    RALM
ArXiv (abs)PDFHTML
Main:8 Pages
19 Figures
Bibliography:3 Pages
Appendix:8 Pages
Abstract

Retrieval-augmented generation (RAG) generally enhances large language models' (LLMs) ability to solve knowledge-intensive tasks. But RAG may also lead to performance degradation due to imperfect retrieval and the model's limited ability to leverage retrieved content. In this work, we evaluate the robustness of LLMs in practical RAG setups (henceforth retrieval robustness). We focus on three research questions: (1) whether RAG is always better than non-RAG; (2) whether more retrieved documents always lead to better performance; (3) and whether document orders impact results. To facilitate this study, we establish a benchmark of 1500 open-domain questions, each with retrieved documents from Wikipedia. We introduce three robustness metrics, each corresponds to one research question. Our comprehensive experiments, involving 11 LLMs and 3 prompting strategies, reveal that all of these LLMs exhibit surprisingly high retrieval robustness; nonetheless, different degrees of imperfect robustness hinders them from fully utilizing the benefits of RAG.

View on arXiv
@article{cao2025_2505.21870,
  title={ Evaluating the Retrieval Robustness of Large Language Models },
  author={ Shuyang Cao and Karthik Radhakrishnan and David Rosenberg and Steven Lu and Pengxiang Cheng and Lu Wang and Shiyue Zhang },
  journal={arXiv preprint arXiv:2505.21870},
  year={ 2025 }
}
Comments on this paper