ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16229
23
0

CT-Agent: A Multimodal-LLM Agent for 3D CT Radiology Question Answering

22 May 2025
Yuren Mao
Wenyi Xu
Yuyang Qin
Yunjun Gao
    MedIm
ArXivPDFHTML
Abstract

Computed Tomography (CT) scan, which produces 3D volumetric medical data that can be viewed as hundreds of cross-sectional images (a.k.a. slices), provides detailed anatomical information for diagnosis. For radiologists, creating CT radiology reports is time-consuming and error-prone. A visual question answering (VQA) system that can answer radiologists' questions about some anatomical regions on the CT scan and even automatically generate a radiology report is urgently needed. However, existing VQA systems cannot adequately handle the CT radiology question answering (CTQA) task for: (1) anatomic complexity makes CT images difficult to understand; (2) spatial relationship across hundreds slices is difficult to capture. To address these issues, this paper proposes CT-Agent, a multimodal agentic framework for CTQA. CT-Agent adopts anatomically independent tools to break down the anatomic complexity; furthermore, it efficiently captures the across-slice spatial relationship with a global-local token compression strategy. Experimental results on two 3D chest CT datasets, CT-RATE and RadGenome-ChestCT, verify the superior performance of CT-Agent.

View on arXiv
@article{mao2025_2505.16229,
  title={ CT-Agent: A Multimodal-LLM Agent for 3D CT Radiology Question Answering },
  author={ Yuren Mao and Wenyi Xu and Yuyang Qin and Yunjun Gao },
  journal={arXiv preprint arXiv:2505.16229},
  year={ 2025 }
}
Comments on this paper