From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?

9 June 2025

Abstract

While existing benchmarks probe the reasoning abilities of large language models (LLMs) across diverse domains, they predominantly assess passive reasoning, providing models with all the information needed to reach a solution. By contrast, active reasoning-where an LLM must interact with external systems to acquire missing evidence or data-has received little systematic attention. To address this shortfall, we present AR-Bench, a novel benchmark designed explicitly to evaluate an LLM's active reasoning skills. AR-Bench comprises three task families-detective cases, situation puzzles, and guessing numbers-that together simulate real-world, agentic scenarios and measure performance across commonsense, logical, and symbolic reasoning challenges. Empirical evaluation on AR-Bench demonstrates that contemporary LLMs exhibit pronounced difficulties with active reasoning: they frequently fail to acquire or leverage the information needed to solve tasks. This gap highlights a stark divergence between their passive and active reasoning abilities. Moreover, ablation studies indicate that even advanced strategies, such as tree-based searching or post-training approaches, yield only modest gains and fall short of the levels required for real-world deployment. Collectively, these findings highlight the critical need to advance methodology for active reasoning, e.g., incorporating interactive learning, real-time feedback loops, and environment-aware objectives for training. The benchmark is publicly available at:this https URL.

View on arXiv

@article{zhou2025_2506.08295,
  title={ From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information? },
  author={ Zhanke Zhou and Xiao Feng and Zhaocheng Zhu and Jiangchao Yao and Sanmi Koyejo and Bo Han },
  journal={arXiv preprint arXiv:2506.08295},
  year={ 2025 }
}

Comments on this paper