Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.14391
Cited By
v1
v2 (latest)
How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?
19 April 2025
Rahul Thapa
Andrew Li
Qingyang Wu
Bryan He
Yuki Sahashi
Christina Binder
Angela Zhang
Ben Athiwaratkun
Shuaiwen Leon Song
David Ouyang
James Zou
LM&MA
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?"
40 / 40 papers shown
Title
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Yanzhe Zhang
Xiren Zhou
MoE
SyDa
122
70
0
03 Mar 2025
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
405
699
0
20 Feb 2025
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Yi Wang
Xinhao Li
Ziang Yan
Yinan He
Jiashuo Yu
...
Kai Chen
Wenhai Wang
Yu Qiao
Yali Wang
Limin Wang
180
51
0
21 Jan 2025
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Orr Zohar
Xiaohan Wang
Yann Dubois
Nikhil Mehta
Tong Xiao
...
Xiaofang Wang
F. Xu
Ning Zhang
Serena Yeung-Levy
Xide Xia
VLM
188
28
0
13 Dec 2024
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
237
1,038
0
25 Oct 2024
EchoPrime: A Multi-Video View-Informed Vision-Language Model for Comprehensive Echocardiography Interpretation
Milos Vukadinovic
Xiu Tang
N. Yuan
Paul Cheng
Debiao Li
Susan Cheng
Bryan He
David Ouyang
58
11
0
13 Oct 2024
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine
Yunfei Xie
Ce Zhou
Lang Gao
Juncheng Wu
Xianhang Li
...
Sheng Liu
Lei Xing
James Zou
Cihang Xie
Yuyin Zhou
LM&MA
MedIm
177
32
0
06 Aug 2024
Capabilities of Gemini Models in Medicine
Khaled Saab
Tao Tu
Wei-Hung Weng
Ryutaro Tanno
David Stutz
...
Christopher Semturs
S. S. Mahdavi
Juraj Gottweis
Alan Karthikesalingam
Vivek Natarajan
ELM
AI4MH
LM&MA
81
183
0
29 Apr 2024
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin
Sam Ade Jacobs
A. A. Awan
J. Aneja
Ahmed Hassan Awadallah
...
Li Zhang
Yi Zhang
Yue Zhang
Yunan Zhang
Xiren Zhou
LRM
ALM
183
1,267
0
22 Apr 2024
Fewer Truncations Improve Language Modeling
Hantian Ding
Zijian Wang
Giovanni Paolini
Varun Kumar
Anoop Deoras
Dan Roth
Stefano Soatto
111
14
0
16 Apr 2024
Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks
Hyunjae Kim
Hyeon Hwang
Jiwoo Lee
Sihyeon Park
Dain Kim
Taewhoo Lee
Chanwoong Yoon
Jiwoong Sohn
Donghee Choi
Jaewoo Kang
ELM
AI4MH
LRM
116
21
0
30 Mar 2024
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Ruyi Xu
Yuan Yao
Zonghao Guo
Junbo Cui
Zanlin Ni
Chunjiang Ge
Tat-Seng Chua
Zhiyuan Liu
Maosong Sun
Gao Huang
VLM
MLLM
115
121
0
18 Mar 2024
From Beginner to Expert: Modeling Medical Knowledge into General LLMs
Qiang Li
Xiaoyan Yang
Haowen Wang
Qin Wang
Lei Liu
...
Wangshu Zhang
Teng Xu
Jinjie Gu
Jing Zheng
Guannan Zhang
LM&MA
ELM
AI4MH
102
16
0
02 Dec 2023
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Conghui He
Jiaqi Wang
Feng Zhao
Dahua Lin
MLLM
VLM
200
683
0
21 Nov 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
353
711
0
16 Nov 2023
CogVLM: Visual Expert for Pretrained Language Models
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLM
MLLM
153
517
0
06 Nov 2023
NurViD: A Large Expert-Level Video Database for Nursing Procedure Activity Understanding
Ming Hu
Lin Wang
Siyuan Yan
Don Ma
Qingli Ren
Peng Xia
Wei Feng
Peibo Duan
Lie Ju
Zongyuan Ge
85
15
0
20 Oct 2023
Towards Generalist Biomedical AI
Tao Tu
Shekoofeh Azizi
Danny Driess
M. Schaekermann
Mohamed Amin
...
Yossi Matias
K. Singhal
Peter R. Florence
Alan Karthikesalingam
Vivek Natarajan
LM&MA
MedIm
AI4MH
111
276
0
26 Jul 2023
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Tri Dao
LRM
122
1,335
0
17 Jul 2023
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Mostafa Dehghani
Basil Mustafa
Josip Djolonga
Jonathan Heek
Matthias Minderer
...
Avital Oliver
Piotr Padlewski
A. Gritsenko
Mario Luvcić
N. Houlsby
ViT
182
119
0
12 Jul 2023
Quilt-1M: One Million Image-Text Pairs for Histopathology
Wisdom O. Ikezogwo
M. S. Seyfioglu
Fatemeh Ghezloo
Dylan Stefan Chan Geva
Fatwir Sheikh Mohammed
Pavan Kumar Anand
Ranjay Krishna
Linda G. Shapiro
CLIP
VLM
323
125
0
20 Jun 2023
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Muhammad Maaz
H. Rasheed
Salman Khan
Fahad Shahbaz Khan
MLLM
145
661
0
08 Jun 2023
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
Chunyuan Li
Cliff Wong
Sheng Zhang
Naoto Usuyama
Haotian Liu
Jianwei Yang
Tristan Naumann
Hoifung Poon
Jianfeng Gao
LM&MA
MedIm
138
800
0
01 Jun 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLM
MLLM
167
2,074
0
20 Apr 2023
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
577
4,936
0
17 Apr 2023
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
275
1,205
0
27 Mar 2023
CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection
C. Nwoye
Tong Yu
Saurav Sharma
Aditya Murali
Deepak Alapatt
...
Pietro Mascagni
B. Seeliger
Cristians Gonzalez
Didier Mutter
N. Padoy
102
20
0
13 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
442
4,664
0
30 Jan 2023
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
230
3,766
0
06 Dec 2022
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
209
3,514
0
16 Oct 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
420
3,615
0
29 Apr 2022
A Dataset for Medical Instructional Video Classification and Question Answering
D. Gupta
Kush Attal
Dina Demner-Fushman
105
33
0
30 Jan 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
557
4,429
0
28 Jan 2022
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Andrew Jaegle
Sebastian Borgeaud
Jean-Baptiste Alayrac
Carl Doersch
Catalin Ionescu
...
Olivier J. Hénaff
M. Botvinick
Andrew Zisserman
Oriol Vinyals
João Carreira
MLLM
VLM
GNN
112
585
0
30 Jul 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
346
2,540
0
20 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
183
1,193
0
01 Apr 2021
What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams
Di Jin
Eileen Pan
Nassim Oufattole
W. Weng
Hanyi Fang
Peter Szolovits
FaML
ELM
LM&MA
132
814
0
28 Sep 2020
PathVQA: 30000+ Questions for Medical Visual Question Answering
Xuehai He
Yichen Zhang
Luntian Mou
Eric Xing
P. Xie
LM&MA
71
246
0
07 Mar 2020
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
416
916
0
13 Sep 2019
CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison
Jeremy Irvin
Pranav Rajpurkar
M. Ko
Yifan Yu
Silviana Ciurea-Ilcus
...
D. Larson
C. Langlotz
Bhavik Patel
M. Lungren
A. Ng
120
2,610
0
21 Jan 2019
1