ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.05256
38
24

Holistic Evaluation of GPT-4V for Biomedical Imaging

10 November 2023
Zheng Liu
Hanqi Jiang
Tianyang Zhong
Zihao Wu
Chong Ma
Yiwei Li
Xiao-Xing Yu
Yutong Zhang
Yi Pan
Peng Shu
Yanjun Lyu
Lu Zhang
Junjie Yao
Peixin Dong
Chao-Yang Cao
Zhe Xiao
Jiaqi Wang
Huan Zhao
Shaochen Xu
Yaonai Wei
Jingyuan Chen
Haixing Dai
Peilong Wang
Haoyang He
Zewei Wang
Xinyu Wang
Xu-Yao Zhang
Lin Zhao
Yi-Hsueh Liu
Kai Zhang
Li Yan
Lichao Sun
Jun Liu
Ning Qiang
Bao Ge
Xiaoyan Cai
Shijie Zhao
Xintao Hu
Yi Yuan
Gang Li
Shu Zhang
Xin Zhang
Xi Jiang
Tuo Zhang
Dinggang Shen
Quanzheng Li
Wei Liu
Xiang Li
Dajiang Zhu
Tianming Liu
    ELMLM&MA
ArXiv (abs)PDFHTML
Abstract

In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and more. Tasks include modality recognition, anatomy localization, disease diagnosis, report generation, and lesion detection. The extensive experiments provide insights into GPT-4V's strengths and weaknesses. Results show GPT-4V's proficiency in modality and anatomy recognition but difficulty with disease diagnosis and localization. GPT-4V excels at diagnostic report generation, indicating strong image captioning skills. While promising for biomedical imaging AI, GPT-4V requires further enhancement and validation before clinical deployment. We emphasize responsible development and testing for trustworthy integration of biomedical AGI. This rigorous evaluation of GPT-4V on diverse medical images advances understanding of multimodal large language models (LLMs) and guides future work toward impactful healthcare applications.

View on arXiv
Comments on this paper