ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.14391
  4. Cited By
How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?
v1v2 (latest)

How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?

19 April 2025
Rahul Thapa
Andrew Li
Qingyang Wu
Bryan He
Yuki Sahashi
Christina Binder
Angela Zhang
Ben Athiwaratkun
Shuaiwen Leon Song
David Ouyang
James Zou
    LM&MA
ArXiv (abs)PDFHTML

Papers citing "How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?"

40 / 40 papers shown
Title
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Yanzhe Zhang
Xiren Zhou
MoESyDa
122
70
0
03 Mar 2025
Qwen2.5-VL Technical Report
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
405
699
0
20 Feb 2025
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Yi Wang
Xinhao Li
Ziang Yan
Yinan He
Jiashuo Yu
...
Kai Chen
Wenhai Wang
Yu Qiao
Yali Wang
Limin Wang
180
51
0
21 Jan 2025
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Orr Zohar
Xiaohan Wang
Yann Dubois
Nikhil Mehta
Tong Xiao
...
Xiaofang Wang
F. Xu
Ning Zhang
Serena Yeung-Levy
Xide Xia
VLM
188
28
0
13 Dec 2024
GPT-4o System Card
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
237
1,038
0
25 Oct 2024
EchoPrime: A Multi-Video View-Informed Vision-Language Model for
  Comprehensive Echocardiography Interpretation
EchoPrime: A Multi-Video View-Informed Vision-Language Model for Comprehensive Echocardiography Interpretation
Milos Vukadinovic
Xiu Tang
N. Yuan
Paul Cheng
Debiao Li
Susan Cheng
Bryan He
David Ouyang
58
11
0
13 Oct 2024
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine
Yunfei Xie
Ce Zhou
Lang Gao
Juncheng Wu
Xianhang Li
...
Sheng Liu
Lei Xing
James Zou
Cihang Xie
Yuyin Zhou
LM&MAMedIm
177
32
0
06 Aug 2024
Capabilities of Gemini Models in Medicine
Capabilities of Gemini Models in Medicine
Khaled Saab
Tao Tu
Wei-Hung Weng
Ryutaro Tanno
David Stutz
...
Christopher Semturs
S. S. Mahdavi
Juraj Gottweis
Alan Karthikesalingam
Vivek Natarajan
ELMAI4MHLM&MA
81
183
0
29 Apr 2024
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
  Phone
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin
Sam Ade Jacobs
A. A. Awan
J. Aneja
Ahmed Hassan Awadallah
...
Li Zhang
Yi Zhang
Yue Zhang
Yunan Zhang
Xiren Zhou
LRMALM
183
1,267
0
22 Apr 2024
Fewer Truncations Improve Language Modeling
Fewer Truncations Improve Language Modeling
Hantian Ding
Zijian Wang
Giovanni Paolini
Varun Kumar
Anoop Deoras
Dan Roth
Stefano Soatto
111
14
0
16 Apr 2024
Small Language Models Learn Enhanced Reasoning Skills from Medical
  Textbooks
Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks
Hyunjae Kim
Hyeon Hwang
Jiwoo Lee
Sihyeon Park
Dain Kim
Taewhoo Lee
Chanwoong Yoon
Jiwoong Sohn
Donghee Choi
Jaewoo Kang
ELMAI4MHLRM
116
21
0
30 Mar 2024
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Ruyi Xu
Yuan Yao
Zonghao Guo
Junbo Cui
Zanlin Ni
Chunjiang Ge
Tat-Seng Chua
Zhiyuan Liu
Maosong Sun
Gao Huang
VLMMLLM
115
121
0
18 Mar 2024
From Beginner to Expert: Modeling Medical Knowledge into General LLMs
From Beginner to Expert: Modeling Medical Knowledge into General LLMs
Qiang Li
Xiaoyan Yang
Haowen Wang
Qin Wang
Lei Liu
...
Wangshu Zhang
Teng Xu
Jinjie Gu
Jing Zheng
Guannan Zhang
LM&MAELMAI4MH
102
16
0
02 Dec 2023
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Conghui He
Jiaqi Wang
Feng Zhao
Dahua Lin
MLLMVLM
200
683
0
21 Nov 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before
  Projection
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLMMLLM
353
711
0
16 Nov 2023
CogVLM: Visual Expert for Pretrained Language Models
CogVLM: Visual Expert for Pretrained Language Models
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLMMLLM
153
517
0
06 Nov 2023
NurViD: A Large Expert-Level Video Database for Nursing Procedure
  Activity Understanding
NurViD: A Large Expert-Level Video Database for Nursing Procedure Activity Understanding
Ming Hu
Lin Wang
Siyuan Yan
Don Ma
Qingli Ren
Peng Xia
Wei Feng
Peibo Duan
Lie Ju
Zongyuan Ge
85
15
0
20 Oct 2023
Towards Generalist Biomedical AI
Towards Generalist Biomedical AI
Tao Tu
Shekoofeh Azizi
Danny Driess
M. Schaekermann
Mohamed Amin
...
Yossi Matias
K. Singhal
Peter R. Florence
Alan Karthikesalingam
Vivek Natarajan
LM&MAMedImAI4MH
111
276
0
26 Jul 2023
FlashAttention-2: Faster Attention with Better Parallelism and Work
  Partitioning
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Tri Dao
LRM
122
1,335
0
17 Jul 2023
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and
  Resolution
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Mostafa Dehghani
Basil Mustafa
Josip Djolonga
Jonathan Heek
Matthias Minderer
...
Avital Oliver
Piotr Padlewski
A. Gritsenko
Mario Luvcić
N. Houlsby
ViT
182
119
0
12 Jul 2023
Quilt-1M: One Million Image-Text Pairs for Histopathology
Quilt-1M: One Million Image-Text Pairs for Histopathology
Wisdom O. Ikezogwo
M. S. Seyfioglu
Fatemeh Ghezloo
Dylan Stefan Chan Geva
Fatwir Sheikh Mohammed
Pavan Kumar Anand
Ranjay Krishna
Linda G. Shapiro
CLIPVLM
323
125
0
20 Jun 2023
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and
  Language Models
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Muhammad Maaz
H. Rasheed
Salman Khan
Fahad Shahbaz Khan
MLLM
145
661
0
08 Jun 2023
LLaVA-Med: Training a Large Language-and-Vision Assistant for
  Biomedicine in One Day
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
Chunyuan Li
Cliff Wong
Sheng Zhang
Naoto Usuyama
Haotian Liu
Jianwei Yang
Tristan Naumann
Hoifung Poon
Jianfeng Gao
LM&MAMedIm
138
800
0
01 Jun 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large
  Language Models
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLMMLLM
167
2,074
0
20 Apr 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
577
4,936
0
17 Apr 2023
Sigmoid Loss for Language Image Pre-Training
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIPVLM
275
1,205
0
27 Mar 2023
CholecTriplet2022: Show me a tool and tell me the triplet -- an
  endoscopic vision challenge for surgical action triplet detection
CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection
C. Nwoye
Tong Yu
Saurav Sharma
Aditya Murali
Deepak Alapatt
...
Pietro Mascagni
B. Seeliger
Cristians Gonzalez
Didier Mutter
N. Padoy
102
20
0
13 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
442
4,664
0
30 Jan 2023
Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
230
3,766
0
06 Dec 2022
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLMMLLMCLIP
209
3,514
0
16 Oct 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
420
3,615
0
29 Apr 2022
A Dataset for Medical Instructional Video Classification and Question
  Answering
A Dataset for Medical Instructional Video Classification and Question Answering
D. Gupta
Kush Attal
Dina Demner-Fushman
105
33
0
30 Jan 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLMBDLVLMCLIP
557
4,429
0
28 Jan 2022
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Andrew Jaegle
Sebastian Borgeaud
Jean-Baptiste Alayrac
Carl Doersch
Catalin Ionescu
...
Olivier J. Hénaff
M. Botvinick
Andrew Zisserman
Oriol Vinyals
João Carreira
MLLMVLMGNN
112
585
0
30 Jul 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
346
2,540
0
20 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
183
1,193
0
01 Apr 2021
What Disease does this Patient Have? A Large-scale Open Domain Question
  Answering Dataset from Medical Exams
What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams
Di Jin
Eileen Pan
Nassim Oufattole
W. Weng
Hanyi Fang
Peter Szolovits
FaMLELMLM&MA
132
814
0
28 Sep 2020
PathVQA: 30000+ Questions for Medical Visual Question Answering
PathVQA: 30000+ Questions for Medical Visual Question Answering
Xuehai He
Yichen Zhang
Luntian Mou
Eric Xing
P. Xie
LM&MA
71
246
0
07 Mar 2020
PubMedQA: A Dataset for Biomedical Research Question Answering
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
416
916
0
13 Sep 2019
CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and
  Expert Comparison
CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison
Jeremy Irvin
Pranav Rajpurkar
M. Ko
Yifan Yu
Silviana Ciurea-Ilcus
...
D. Larson
C. Langlotz
Bhavik Patel
M. Lungren
A. Ng
120
2,610
0
21 Jan 2019
1