ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.11613
0
0

MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models

16 May 2025
Xiaomin Li
Mingye Gao
Yuexing Hao
Taoran Li
Guangya Wan
Z. Wang
Yijun Wang
    LM&MA
    ELM
    AI4MH
ArXivPDFHTML
Abstract

Clinical guidelines, typically structured as decision trees, are central to evidence-based medical practice and critical for ensuring safe and accurate diagnostic decision-making. However, it remains unclear whether Large Language Models (LLMs) can reliably follow such structured protocols. In this work, we introduce MedGUIDE, a new benchmark for evaluating LLMs on their ability to make guideline-consistent clinical decisions. MedGUIDE is constructed from 55 curated NCCN decision trees across 17 cancer types and uses clinical scenarios generated by LLMs to create a large pool of multiple-choice diagnostic questions. We apply a two-stage quality selection process, combining expert-labeled reward models and LLM-as-a-judge ensembles across ten clinical and linguistic criteria, to select 7,747 high-quality samples. We evaluate 25 LLMs spanning general-purpose, open-source, and medically specialized models, and find that even domain-specific LLMs often underperform on tasks requiring structured guideline adherence. We also test whether performance can be improved via in-context guideline inclusion or continued pretraining. Our findings underscore the importance of MedGUIDE in assessing whether LLMs can operate safely within the procedural frameworks expected in real-world clinical settings.

View on arXiv
@article{li2025_2505.11613,
  title={ MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models },
  author={ Xiaomin Li and Mingye Gao and Yuexing Hao and Taoran Li and Guangya Wan and Zihan Wang and Yijun Wang },
  journal={arXiv preprint arXiv:2505.11613},
  year={ 2025 }
}
Comments on this paper