CKD-EHR:Clinical Knowledge Distillation for Electronic Health Records

18 June 2025

Junke Wang

Hongshun Ling

Li Zhang

Longqian Zhang

Fang Wang

Yuan Gao

Zhi Li

ArXiv (abs)PDF HTML Github

Main:23 Pages

Bibliography:5 Pages

5 Tables

Abstract

Electronic Health Records (EHR)-based disease prediction models have demonstrated significant clinical value in promoting precision medicine and enabling early intervention. However, existing large language models face two major challenges: insufficient representation of medical knowledge and low efficiency in clinical deployment. To address these challenges, this study proposes the CKD-EHR (Clinical Knowledge Distillation for EHR) framework, which achieves efficient and accurate disease risk prediction through knowledge distillation techniques. Specifically, the large language model Qwen2.5-7B is first fine-tuned on medical knowledge-enhanced data to serve as the teacherthis http URLthen generates interpretable soft labels through a multi-granularity attention distillation mechanism. Finally, the distilled knowledge is transferred to a lightweight BERT student model. Experimental results show that on the MIMIC-III dataset, CKD-EHR significantly outperforms the baseline model:diagnostic accuracy is increased by 9%, F1-score is improved by 27%, and a 22.2 times inference speedup is achieved. This innovative solution not only greatly improves resource utilization efficiency but also significantly enhances the accuracy and timeliness of diagnosis, providing a practical technical approach for resource optimization in clinical settings. The code and data for this research are available athttps://github.com/209506702/CKD_EHR.

View on arXiv

@article{wang2025_2506.15118,
  title={ CKD-EHR:Clinical Knowledge Distillation for Electronic Health Records },
  author={ Junke Wang and Hongshun Ling and Li Zhang and Longqian Zhang and Fang Wang and Yuan Gao and Zhi Li },
  journal={arXiv preprint arXiv:2506.15118},
  year={ 2025 }
}

Comments on this paper