ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.05928
39
0

ClinKD: Cross-Modal Clinical Knowledge Distiller For Multi-Task Medical Images

9 February 2025
Hongyu Ge
Longkun Hao
Zihui Xu
Zhenxin Lin
Bin Li
Shoujun Zhou
Hongjin Zhao
Y. Liu
ArXivPDFHTML
Abstract

Medical Visual Question Answering (Med-VQA) represents a critical and challenging subtask within the general VQA domain. Despite significant advancements in general Visual Question Answering (VQA), multimodal large language models (MLLMs) still exhibit substantial limitations when handling multi-task VQA scenarios. These limitations manifest through erroneous spatial localization and misinterpretation of medical images, which primarily arise from two fundamental issues: inadequate image-text alignment and insufficient medical knowledge in general-purpose MLLMs for specialized medical applications. To address these issues, we introduce the Cross-Modal Clinical Knowledge Distiller (ClinKD), an innovative framework designed to enhance image-text alignment and establish more effective medical knowledge adaptation mechanisms, which enables MLLMs to adapt to medical knowledge. Our extensive experimental evaluations demonstrate that the ClinKD achieves state-of-the-art performance on the Med-GRIT-270k dataset, a challenging medical benchmark containing fine-grained multi-task QA pairs. The results indicate that our approach not only significantly improves image-text alignment but also effectively enables MLLMs to adapt to the medical knowledge. The source code for ClinKD is available at:this https URL.

View on arXiv
@article{ge2025_2502.05928,
  title={ ClinKD: Cross-Modal Clinical Knowledge Distiller For Multi-Task Medical Images },
  author={ Hongyu Ge and Longkun Hao and Zihui Xu and Zhenxin Lin and Bin Li and Shoujun Zhou and Hongjin Zhao and Yihang Liu },
  journal={arXiv preprint arXiv:2502.05928},
  year={ 2025 }
}
Comments on this paper