ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.17599
41
0

Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark

22 March 2025
Zehan Li
Yiying Yang
Jiping Lang
Wenhao Jiang
Yuhang Zhao
Shuang Li
Dingqian Wang
Zhu Lin
X. Li
Yuze Tang
Jiexian Qiu
Xiaolin Lu
Hongji Yu
Shuang Chen
Yuhua Bi
Xiaofei Zeng
Yixian Chen
Junrong Chen
Lin Yao
    AI4MH
    LM&MA
    ELM
ArXivPDFHTML
Abstract

Large Language Models (LLMs) have demonstrated considerable potential in general practice. However, existing benchmarks and evaluation frameworks primarily depend on exam-style or simplified question-answer formats, lacking a competency-based structure aligned with the real-world clinical responsibilities encountered in general practice. Consequently, the extent to which LLMs can reliably fulfill the duties of general practitioners (GPs) remains uncertain. In this work, we propose a novel evaluation framework to assess the capability of LLMs to function as GPs. Based on this framework, we introduce a general practice benchmark (GPBench), whose data are meticulously annotated by domain experts in accordance with routine clinical practice standards. We evaluate ten state-of-the-art LLMs and analyze their competencies. Our findings indicate that current LLMs are not yet ready for deployment in such settings without human oversight, and further optimization specifically tailored to the daily responsibilities of GPs is essential.

View on arXiv
@article{li2025_2503.17599,
  title={ Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark },
  author={ Zheqing Li and Yiying Yang and Jiping Lang and Wenhao Jiang and Yuhang Zhao and Shuang Li and Dingqian Wang and Zhu Lin and Xuanna Li and Yuze Tang and Jiexian Qiu and Xiaolin Lu and Hongji Yu and Shuang Chen and Yuhua Bi and Xiaofei Zeng and Yixian Chen and Junrong Chen and Lin Yao },
  journal={arXiv preprint arXiv:2503.17599},
  year={ 2025 }
}
Comments on this paper