21
0

Multimodal Survival Modeling in the Age of Foundation Models

Abstract

The Cancer Genome Atlas (TCGA) has enabled novel discoveries and served as a large-scale reference through its harmonized genomics, clinical, and image data. Prior studies have trained bespoke cancer survival prediction models from unimodal or multimodal TCGA data. A modern paradigm in biomedical deep learning is the development of foundation models (FMs) to derive meaningful feature embeddings, agnostic to a specific modeling task. Biomedical text especially has seen growing development of FMs. While TCGA contains free-text data as pathology reports, these have been historically underutilized. Here, we investigate the feasibility of training classical, multimodal survival models over zero-shot embeddings extracted by FMs. We show the ease and additive effect of multimodal fusion, outperforming unimodal models. We demonstrate the benefit of including pathology report text and rigorously evaluate the effect of model-based text summarization and hallucination. Overall, we modernize survival modeling by leveraging FMs and information extraction from pathology reports.

View on arXiv
@article{song2025_2505.07683,
  title={ Multimodal Survival Modeling in the Age of Foundation Models },
  author={ Steven Song and Morgan Borjigin-Wang and Irene Madejski and Robert L. Grossman },
  journal={arXiv preprint arXiv:2505.07683},
  year={ 2025 }
}
Comments on this paper