ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.11341
17
0

Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models

16 May 2025
Banca Calvo Figueras
Rodrigo Agerri
    ALM
    ELM
    LRM
ArXivPDFHTML
Abstract

The task of Critical Questions Generation (CQs-Gen) aims to foster critical thinking by enabling systems to generate questions that expose assumptions and challenge the reasoning in arguments. Despite growing interest in this area, progress has been hindered by the lack of suitable datasets and automatic evaluation standards. This work presents a comprehensive approach to support the development and benchmarking of systems for this task. We construct the first large-scale manually-annotated dataset. We also investigate automatic evaluation methods and identify a reference-based technique using large language models (LLMs) as the strategy that best correlates with human judgments. Our zero-shot evaluation of 11 LLMs establishes a strong baseline while showcasing the difficulty of the task. Data, code, and a public leaderboard are provided to encourage further research not only in terms of model performance, but also to explore the practical benefits of CQs-Gen for both automated reasoning and human critical thinking.

View on arXiv
@article{figueras2025_2505.11341,
  title={ Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models },
  author={ Banca Calvo Figueras and Rodrigo Agerri },
  journal={arXiv preprint arXiv:2505.11341},
  year={ 2025 }
}
Comments on this paper