ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.10089
24
0

XRAG: Cross-lingual Retrieval-Augmented Generation

15 May 2025
Wei Liu
Sony Trenous
Leonardo F. R. Ribeiro
Bill Byrne
Felix Hieber
    RALM
ArXivPDFHTML
Abstract

We propose XRAG, a novel benchmark designed to evaluate the generation abilities of LLMs in cross-lingual Retrieval-Augmented Generation (RAG) settings where the user language does not match the retrieval results. XRAG is constructed from recent news articles to ensure that its questions require external knowledge to be answered. It covers the real-world scenarios of monolingual and multilingual retrieval, and provides relevancy annotations for each retrieved document. Our novel dataset construction pipeline results in questions that require complex reasoning, as evidenced by the significant gap between human and LLM performance. Consequently, XRAG serves as a valuable benchmark for studying LLM reasoning abilities, even before considering the additional cross-lingual complexity. Experimental results on five LLMs uncover two previously unreported challenges in cross-lingual RAG: 1) in the monolingual retrieval setting, all evaluated models struggle with response language correctness; 2) in the multilingual retrieval setting, the main challenge lies in reasoning over retrieved information across languages rather than generation of non-English text.

View on arXiv
@article{liu2025_2505.10089,
  title={ XRAG: Cross-lingual Retrieval-Augmented Generation },
  author={ Wei Liu and Sony Trenous and Leonardo F. R. Ribeiro and Bill Byrne and Felix Hieber },
  journal={arXiv preprint arXiv:2505.10089},
  year={ 2025 }
}
Comments on this paper