ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.21248
31
2

ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition

27 March 2025
Yong Liu
Zonglin Yang
Tong Xie
Jinjie Ni
Ben Gao
Yuan Li
Shixiang Tang
Wanli Ouyang
Min Zhang
Dongzhan Zhou
ArXivPDFHTML
Abstract

Large language models (LLMs) have demonstrated potential in assisting scientific research, yet their ability to discover high-quality research hypotheses remains unexamined due to the lack of a dedicated benchmark. To address this gap, we introduce the first large-scale benchmark for evaluating LLMs with a near-sufficient set of sub-tasks of scientific discovery: inspiration retrieval, hypothesis composition, and hypothesis ranking. We develop an automated framework that extracts critical components - research questions, background surveys, inspirations, and hypotheses - from scientific papers across 12 disciplines, with expert validation confirming its accuracy. To prevent data contamination, we focus exclusively on papers published in 2024, ensuring minimal overlap with LLM pretraining data. Our evaluation reveals that LLMs perform well in retrieving inspirations, an out-of-distribution task, suggesting their ability to surface novel knowledge associations. This positions LLMs as "research hypothesis mines", capable of facilitating automated scientific discovery by generating innovative hypotheses at scale with minimal human intervention.

View on arXiv
@article{liu2025_2503.21248,
  title={ ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition },
  author={ Yujie Liu and Zonglin Yang and Tong Xie and Jinjie Ni and Ben Gao and Yuqiang Li and Shixiang Tang and Wanli Ouyang and Erik Cambria and Dongzhan Zhou },
  journal={arXiv preprint arXiv:2503.21248},
  year={ 2025 }
}
Comments on this paper