ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.13668
34
47

MAIRA-1: A specialised large multimodal model for radiology report generation

22 November 2023
Stephanie L. Hyland
Shruthi Bannur
Kenza Bouzid
Daniel Coelho De Castro
M. Ranjit
Anton Schwaighofer
Fernando Pérez-García
Valentina Salvatelli
Shaury Srivastav
Anja Thieme
Noel C. F. Codella
M. Lungren
Maria T. A. Wetscherek
Ozan Oktay
Javier Alvarez-Valle
ArXivPDFHTML
Abstract

We present a radiology-specific multimodal model for the task for generating radiological reports from chest X-rays (CXRs). Our work builds on the idea that large language model(s) can be equipped with multimodal capabilities through alignment with pre-trained vision encoders. On natural images, this has been shown to allow multimodal models to gain image understanding and description capabilities. Our proposed model (MAIRA-1) leverages a CXR-specific image encoder in conjunction with a fine-tuned large language model based on Vicuna-7B, and text-based data augmentation, to produce reports with state-of-the-art quality. In particular, MAIRA-1 significantly improves on the radiologist-aligned RadCliQ metric and across all lexical metrics considered. Manual review of model outputs demonstrates promising fluency and accuracy of generated reports while uncovering failure modes not captured by existing evaluation practices. More information and resources can be found on the project website: https://aka.ms/maira.

View on arXiv
Comments on this paper