Embedding Radiomics into Vision Transformers for Multimodal Medical Image Classification
- MedIm

Abstract
Background: Deep learning has significantly advanced medical image analysis, with Vision Transformers (ViTs) offering a powerful alternative to convolutional models by modeling long-range dependencies through self-attention. However, ViTs are inherently data-intensive and lack domain-specific inductive biases, limiting their applicability in medical imaging. In contrast, radiomics provides interpretable, handcrafted descriptors of tissue heterogeneity but suffers from limited scalability and integration into end-to-end learning frameworks. In this work, we propose the Radiomics-Embedded Vision Transformer (RE-ViT) that combines radiomic features with data-driven visual embeddings within a ViT backbone.
View on arXiv@article{yang2025_2504.10916, title={ Embedding Radiomics into Vision Transformers for Multimodal Medical Image Classification }, author={ Zhenyu Yang and Haiming Zhu and Rihui Zhang and Haipeng Zhang and Jianliang Wang and Chunhao Wang and Minbin Chen and Fang-Fang Yin }, journal={arXiv preprint arXiv:2504.10916}, year={ 2025 } }
Comments on this paper