Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.07098
Cited By
Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training
15 September 2022
Zhihong Chen
Yu Du
Jinpeng Hu
Yang Liu
Guanbin Li
Xiang Wan
Tsung-Hui Chang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training"
23 / 23 papers shown
Title
MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks
Wenqi Zeng
Yuqi Sun
Chenxi Ma
Weimin Tan
Bo Yan
LM&MA
VLM
50
0
0
09 May 2025
Structure Causal Models and LLMs Integration in Medical Visual Question Answering
Zibo Xu
Qiang Li
Weizhi Nie
Weijie Wang
Anan Liu
CML
MedIm
47
0
0
05 May 2025
PeFoMed: Parameter Efficient Fine-tuning of Multimodal Large Language Models for Medical Imaging
Gang Liu
Jinlong He
Pengfei Li
Genrong He
Zixu Zhao
Shenjun Zhong
LM&MA
76
2
0
17 Jan 2025
GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis
Bo Liu
K. Zou
Liming Zhan
Zexin Lu
Xiaoyu Dong
Yidi Chen
Chengqiang Xie
Jiannong Cao
Xiao-Ming Wu
Huazhu Fu
120
0
0
25 Nov 2024
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Hanqi Jiang
Xixuan Hao
Yuzhou Huang
Chong Ma
Jiaxun Zhang
Yi Pan
Ruimao Zhang
MedIm
35
0
0
01 Oct 2024
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
64
6
0
13 Aug 2024
AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking
Yuheng Li
Tianyu Luan
Yizhou Wu
Shaoyan Pan
Yenho Chen
Xiaofeng Yang
37
4
0
09 Jul 2024
WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering
Pingyi Chen
Chenglu Zhu
Sunyi Zheng
Honglin Li
Lin Yang
47
6
0
08 Jul 2024
Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning
Zishan Gu
Fenglin Liu
Changchang Yin
Ping Zhang
LRM
LM&MA
43
0
0
19 May 2024
LaPA: Latent Prompt Assist Model For Medical Visual Question Answering
Tiancheng Gu
Kaicheng Yang
Dongnan Liu
Weidong Cai
MedIm
29
2
0
19 Apr 2024
Can LLMs' Tuning Methods Work in Medical Multimodal Domain?
Jiawei Chen
Yue Jiang
Dingkang Yang
Mingcheng Li
Jinjie Wei
Ziyun Qian
Lihua Zhang
LM&MA
27
9
0
11 Mar 2024
Enhancing the vision-language foundation model with key semantic knowledge-emphasized report refinement
Cheng Li
Weijian Huang
Hao Yang
Jiarun Liu
Shanshan Wang
MedIm
30
4
0
21 Jan 2024
Enhancing medical vision-language contrastive learning via inter-matching relation modelling
Mingjian Li
Mingyuan Meng
M. Fulham
David Dagan Feng
Lei Bi
Jinman Kim
VLM
40
1
0
19 Jan 2024
UniDCP: Unifying Multiple Medical Vision-language Tasks via Dynamic Cross-modal Learnable Prompts
Chenlu Zhan
Yufei Zhang
Yu Lin
Gaoang Wang
Hongwei Wang
VLM
MedIm
26
5
0
18 Dec 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
Pietro Mascagni
Pietro Mascagni
N. Padoy
Nicolas Padoy
24
20
0
27 Jul 2023
Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
Zunnan Xu
Zhihong Chen
Yong Zhang
Yibing Song
Xiang Wan
Guanbin Li
VLM
35
47
0
21 Jul 2023
Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-Training
Xiaofei Chen
Yuting He
Cheng Xue
Rongjun Ge
Shuo Li
Guanyu Yang
VLM
MedIm
24
12
0
14 Jul 2023
Bi-VLGM : Bi-Level Class-Severity-Aware Vision-Language Graph Matching for Text Guided Medical Image Segmentation
Wenting Chen
Jie Liu
Yixuan Yuan
VLM
26
3
0
20 May 2023
Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts
Zhihong Chen
Shizhe Diao
Benyou Wang
Guanbin Li
Xiang Wan
MedIm
17
29
0
17 Feb 2023
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
Liya Wang
A. Tien
39
7
0
28 Jan 2023
Delving into Masked Autoencoders for Multi-Label Thorax Disease Classification
Junfei Xiao
Yutong Bai
Alan Yuille
Zongwei Zhou
MedIm
ViT
32
58
0
23 Oct 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,434
0
11 Nov 2021
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,743
0
26 Sep 2016
1