R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

22 May 2025

Main:7 Pages

3 Figures

Bibliography:3 Pages

6 Tables

Appendix:4 Pages

Abstract

Large Language Models (LLMs) are powerful but prone to hallucinations due to static knowledge. Retrieval-Augmented Generation (RAG) helps by injecting external information, but current methods often are costly, generalize poorly, or ignore the internal knowledge of the model. In this paper, we introduce R1-Searcher++, a novel framework designed to train LLMs to adaptively leverage both internal and external knowledge sources. R1-Searcher++ employs a two-stage training strategy: an initial SFT Cold-start phase for preliminary format learning, followed by RL for Dynamic Knowledge Acquisition. The RL stage uses outcome-supervision to encourage exploration, incorporates a reward mechanism for internal knowledge utilization, and integrates a memorization mechanism to continuously assimilate retrieved information, thereby enriching the model's internal knowledge. By leveraging internal knowledge and external search engine, the model continuously improves its capabilities, enabling efficient retrieval-augmented reasoning. Our experiments demonstrate that R1-Searcher++ outperforms previous RAG and reasoning methods and achieves efficient retrieval. The code is available atthis https URL.

View on arXiv

@article{song2025_2505.17005,
  title={ R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning },
  author={ Huatong Song and Jinhao Jiang and Wenqing Tian and Zhipeng Chen and Yuhuan Wu and Jiahao Zhao and Yingqian Min and Wayne Xin Zhao and Lei Fang and Ji-Rong Wen },
  journal={arXiv preprint arXiv:2505.17005},
  year={ 2025 }
}

Comments on this paper