ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.02738
50
0
v1v2 (latest)

Open-PMC-18M: A High-Fidelity Large Scale Medical Dataset for Multimodal Representation Learning

3 June 2025
Negin Baghbanzadeh
Sajad Ashkezari
Elham Dolatabadi
Arash Afkanpour
    MedIm
ArXiv (abs)PDFHTML
Main:9 Pages
4 Figures
Bibliography:2 Pages
8 Tables
Appendix:4 Pages
Abstract

Compound figures, which are multi-panel composites containing diverse subfigures, are ubiquitous in biomedical literature, yet large-scale subfigure extraction remains largely unaddressed. Prior work on subfigure extraction has been limited in both dataset size and generalizability, leaving a critical open question: How does high-fidelity image-text alignment via large-scale subfigure extraction impact representation learning in vision-language models? We address this gap by introducing a scalable subfigure extraction pipeline based on transformer-based object detection, trained on a synthetic corpus of 500,000 compound figures, and achieving state-of-the-art performance on both ImageCLEF 2016 and synthetic benchmarks. Using this pipeline, we release OPEN-PMC-18M, a large-scale high quality biomedical vision-language dataset comprising 18 million clinically relevant subfigure-caption pairs spanning radiology, microscopy, and visible light photography. We train and evaluate vision-language models on our curated datasets and show improved performance across retrieval, zero-shot classification, and robustness benchmarks, outperforming existing baselines. We release our dataset, models, and code to support reproducible benchmarks and further study into biomedical vision-language modeling and representation learning.

View on arXiv
@article{baghbanzadeh2025_2506.02738,
  title={ Open-PMC-18M: A High-Fidelity Large Scale Medical Dataset for Multimodal Representation Learning },
  author={ Negin Baghbanzadeh and Sajad Ashkezari and Elham Dolatabadi and Arash Afkanpour },
  journal={arXiv preprint arXiv:2506.02738},
  year={ 2025 }
}
Comments on this paper