R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning

4 June 2025

Main:8 Pages

3 Figures

Bibliography:3 Pages

9 Tables

Appendix:5 Pages

Abstract

Large language models (LLMs) have notably progressed in multi-step and long-chain reasoning. However, extending their reasoning capabilities to encompass deep interactions with search remains a non-trivial challenge, as models often fail to identify optimal reasoning-search interaction trajectories, resulting in suboptimal responses. We propose R-Search, a novel reinforcement learning framework for Reasoning-Search integration, designed to enable LLMs to autonomously execute multi-step reasoning with deep search interaction, and learn optimal reasoning search interaction trajectories via multi-reward signals, improving response quality in complex logic- and knowledge-intensive tasks. R-Search guides the LLM to dynamically decide when to retrieve or reason, while globally integrating key evidence to enhance deep knowledge interaction between reasoning and search. During RL training, R-Search provides multi-stage, multi-type rewards to jointly optimize the reasoning-search trajectory. Experiments on seven datasets show that R-Search outperforms advanced RAG baselines by up to 32.2% (in-domain) and 25.1% (out-of-domain). The code and data are available atthis https URL.

View on arXiv

@article{zhao2025_2506.04185,
  title={ R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning },
  author={ Qingfei Zhao and Ruobing Wang and Dingling Xu and Daren Zha and Limin Liu },
  journal={arXiv preprint arXiv:2506.04185},
  year={ 2025 }
}

Comments on this paper