72

O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL

Yi Yao
He Zhu
Piaohong Wang
Jincheng Ren
Xinlong Yang
Qianben Chen
Xiaowan Li
Dingfeng Shi
Jiaxian Li
Qiexiang Wang
Sinuo Wang
Xinpeng Liu
Jiaqi Wu
Minghao Liu
Wangchunshu Zhou
Main:2 Pages
4 Figures
Bibliography:1 Pages
5 Tables
Appendix:19 Pages
Abstract

The performance gap between closed-source and open-source large language models (LLMs) is largely attributed to disparities in access to high-quality training data. To bridge this gap, we introduce a novel framework for the automated synthesis of sophisticated, research-grade instructional data. Our approach centers on a multi-agent workflow where collaborative AI agents simulate complex tool-integrated reasoning to generate diverse and high-fidelity data end-to-end. Leveraging this synthesized data, we develop a two-stage training strategy that integrates supervised fine-tuning with a novel reinforcement learning method, designed to maximize model alignment and capability. Extensive experiments demonstrate that our framework empowers open-source models across multiple scales, enabling them to achieve new state-of-the-art performance on the major deep research benchmark. This work provides a scalable and effective pathway for advancing open-source LLMs without relying on proprietary data or models.

View on arXiv
Comments on this paper