Text2CT: Towards 3D CT Volume Generation from Free-text Descriptions Using Diffusion Model

7 May 2025

Main:8 Pages

8 Figures

Bibliography:4 Pages

7 Tables

Appendix:3 Pages

Abstract

Generating 3D CT volumes from descriptive free-text inputs presents a transformative opportunity in diagnostics and research. In this paper, we introduce Text2CT, a novel approach for synthesizing 3D CT volumes from textual descriptions using the diffusion model. Unlike previous methods that rely on fixed-format text input, Text2CT employs a novel prompt formulation that enables generation from diverse, free-text descriptions. The proposed framework encodes medical text into latent representations and decodes them into high-resolution 3D CT scans, effectively bridging the gap between semantic text inputs and detailed volumetric representations in a unified 3D framework. Our method demonstrates superior performance in preserving anatomical fidelity and capturing intricate structures as described in the input text. Extensive evaluations show that our approach achieves state-of-the-art results, offering promising potential applications in diagnostics, and data augmentation.

View on arXiv

@article{guo2025_2505.04522,
  title={ Text2CT: Towards 3D CT Volume Generation from Free-text Descriptions Using Diffusion Model },
  author={ Pengfei Guo and Can Zhao and Dong Yang and Yufan He and Vishwesh Nath and Ziyue Xu and Pedro R. A. S. Bassi and Zongwei Zhou and Benjamin D. Simon and Stephanie Anne Harmon and Baris Turkbey and Daguang Xu },
  journal={arXiv preprint arXiv:2505.04522},
  year={ 2025 }
}

Comments on this paper