22
0

CL-RAG: Bridging the Gap in Retrieval-Augmented Generation with Curriculum Learning

Abstract

Retrieval-Augmented Generation (RAG) is an effective method to enhance the capabilities of large language models (LLMs). Existing methods focus on optimizing the retriever or generator in the RAG system by directly utilizing the top-k retrieved documents. However, the documents effectiveness are various significantly across user queries, i.e. some documents provide valuable knowledge while others totally lack critical information. It hinders the retriever and generator's adaptation during training. Inspired by human cognitive learning, curriculum learning trains models using samples progressing from easy to difficult, thus enhancing their generalization ability, and we integrate this effective paradigm to the training of the RAG system. In this paper, we propose a multi-stage Curriculum Learning based RAG system training framework, named CL-RAG. We first construct training data with multiple difficulty levels for the retriever and generator separately through sample evolution. Then, we train the model in stages based on the curriculum learning approach, thereby optimizing the overall performance and generalization of the RAG system more effectively. Our CL-RAG framework demonstrates consistent effectiveness across four open-domain QA datasets, achieving performance gains of 2% to 4% over multiple advanced methods.

View on arXiv
@article{wang2025_2505.10493,
  title={ CL-RAG: Bridging the Gap in Retrieval-Augmented Generation with Curriculum Learning },
  author={ Shaohan Wang and Licheng Zhang and Zheren Fu and Zhendong Mao },
  journal={arXiv preprint arXiv:2505.10493},
  year={ 2025 }
}
Comments on this paper