ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.23601
  4. Cited By
A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis

A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis

29 May 2025
Shengyuan Liu
Boyun Zheng
Wenting Chen
Zhihao Peng
Zhenfei Yin
Jing Shao
Jiancong Hu
Yixuan Yuan
    ELM
ArXivPDFHTML

Papers citing "A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis"

46 / 46 papers shown
Title
Qwen2.5-VL Technical Report
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
287
528
0
20 Feb 2025
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
Xiaokang Chen
Zhiyu Wu
Xingchao Liu
Zizheng Pan
Wen Liu
Zhenda Xie
X. Yu
Chong Ruan
AI4TS
132
139
0
29 Jan 2025
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs
Sheng Zhang
Yanbo Xu
Naoto Usuyama
Hanwen Xu
J. Bagga
...
Carlo Bifulco
M. Lungren
Tristan Naumann
Sheng Wang
Hoifung Poon
LM&MA
MedIm
199
226
0
10 Jan 2025
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced
  Multimodal Understanding
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Z. F. Wu
Xiaokang Chen
Zizheng Pan
Xianglong Liu
Wen Liu
...
Xingkai Yu
Haowei Zhang
Liang Zhao
Yijiao Wang
Chong Ruan
MLLM
VLM
MoE
180
140
0
13 Dec 2024
VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge
VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge
Vishwesh Nath
Wenqi Li
Dong Yang
Andriy Myronenko
Mingxin Zheng
...
Holger Roth
Daguang Xu
Baris Turkbey
Holger Roth
Daguang Xu
VLM
156
6
0
19 Nov 2024
Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry
Wenjun Hou
Yi Cheng
Kaishuai Xu
Yan Hu
Wenjie Li
Jiang-Dong Liu
55
1
0
17 Nov 2024
GPT-4o System Card
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
184
901
0
25 Oct 2024
Frontiers in Intelligent Colonoscopy
Frontiers in Intelligent Colonoscopy
Ge-Peng Ji
Jingyi Liu
Peng Xu
Nick Barnes
Fahad Shahbaz Khan
Salman Khan
Deng-Ping Fan
71
5
0
22 Oct 2024
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
Mohammad Shahab Sepehri
Zalan Fabian
Maryam Soltanolkotabi
Mahdi Soltanolkotabi
MedIm
105
5
0
23 Sep 2024
WCEbleedGen: A wireless capsule endoscopy dataset and its benchmarking
  for automatic bleeding classification, detection, and segmentation
WCEbleedGen: A wireless capsule endoscopy dataset and its benchmarking for automatic bleeding classification, detection, and segmentation
Palak Handa
Manas Dhir
Amirreza Mahbod
Florian Schwarzhans
Ramona Woitek
Nidhi Goel
Deepak Gunjan
26
3
0
22 Aug 2024
Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust
  Visual Question-Localized Answering in Robotic Surgery
Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery
Long Bai
Guankun Wang
Mobarakol Islam
Lalithkumar Seenivasan
An-Chi Wang
Hongliang Ren
69
15
0
09 Aug 2024
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine
Yunfei Xie
Ce Zhou
Lang Gao
Juncheng Wu
Xianhang Li
...
Sheng Liu
Lei Xing
James Zou
Cihang Xie
Yuyin Zhou
LM&MA
MedIm
131
30
0
06 Aug 2024
GP-VLS: A general-purpose vision language model for surgery
GP-VLS: A general-purpose vision language model for surgery
Samuel Schmidgall
Joseph Cho
C. Zakka
W. Hiesinger
LM&MA
91
6
0
27 Jul 2024
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into
  Multimodal LLMs at Scale
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
Junying Chen
Ruyi Ouyang
Anningzhe Gao
Shunian Chen
Guiming Hardy Chen
...
Zhenyang Cai
Ke Ji
Guangjun Yu
Xiang Wan
Benyou Wang
MedIm
LM&MA
59
44
0
27 Jun 2024
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision
  Language Models
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Peng Xia
Ze Chen
Juanxi Tian
Yangrui Gong
Ruibo Hou
...
Jimeng Sun
Zongyuan Ge
Gang Li
James Zou
Huaxiu Yao
MU
VLM
83
36
0
10 Jun 2024
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
Guan-Feng Wang
Long Bai
Wan Jun Nah
Jie Wang
Zhaoxi Zhang
Zhen Chen
Jinlin Wu
Mobarakol Islam
Hongbin Liu
Hongliang Ren
80
18
0
22 Mar 2024
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for
  Medical LVLM
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
Yutao Hu
Tian-Xin Li
Quanfeng Lu
Wenqi Shao
Junjun He
Yu Qiao
Ping Luo
ELM
LM&MA
49
63
0
14 Feb 2024
Advancing Surgical VQA with Scene Graph Knowledge
Advancing Surgical VQA with Scene Graph Knowledge
Kun Yuan
Manasi Kattel
Joël L. Lavanchy
Nassir Navab
V. Srivastav
N. Padoy
76
20
0
15 Dec 2023
VILA: On Pre-training for Visual Language Models
VILA: On Pre-training for Visual Language Models
Ji Lin
Hongxu Yin
Ming-Yu Liu
Yao Lu
Pavlo Molchanov
Andrew Tao
Huizi Mao
Jan Kautz
Mohammad Shoeybi
Song Han
MLLM
VLM
84
400
0
12 Dec 2023
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning
  Benchmark for Expert AGI
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
...
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLM
ELM
VLM
213
905
0
27 Nov 2023
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Conghui He
Jiaqi Wang
Feng Zhao
Dahua Lin
MLLM
VLM
188
656
0
21 Nov 2023
CogVLM: Visual Expert for Pretrained Language Models
CogVLM: Visual Expert for Pretrained Language Models
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLM
MLLM
82
489
0
06 Nov 2023
Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large
  Language Model
Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model
Qichen Ye
Junling Liu
Dading Chong
Peilin Zhou
Yining Hua
...
Meng Cao
Ziming Wang
Xuxin Cheng
Andrew Liu
Zhenhua Guo
AI4MH
LM&MA
ELM
70
21
0
13 Oct 2023
Med-Flamingo: a Multimodal Medical Few-shot Learner
Med-Flamingo: a Multimodal Medical Few-shot Learner
Michael Moor
Qian Huang
Shirley Wu
Michihiro Yasunaga
C. Zakka
Yashodhara Dalmia
E. Reis
Pranav Rajpurkar
J. Leskovec
LM&MA
MedIm
72
260
0
27 Jul 2023
GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided
  Gastrointestinal Disease Detection
GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease Detection
Debesh Jha
Vanshali Sharma
N. Dasu
Nikhil Kumar Tomar
Steven Hicks
...
P. Das
Michael A. Riegler
Pål Halvorsen
Ulas Bagci
Thomas de Lange
46
31
0
16 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
349
4,312
0
09 Jun 2023
LLaVA-Med: Training a Large Language-and-Vision Assistant for
  Biomedicine in One Day
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
Chunyuan Li
Cliff Wong
Sheng Zhang
Naoto Usuyama
Haotian Liu
Jianwei Yang
Tristan Naumann
Hoifung Poon
Jianfeng Gao
LM&MA
MedIm
101
772
0
01 Jun 2023
Surgical-VQLA: Transformer with Gated Vision-Language Embedding for
  Visual Question Localized-Answering in Robotic Surgery
Surgical-VQLA: Transformer with Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery
Long Bai
Mobarakol Islam
Lalithkumar Seenivasan
Hongliang Ren
51
30
0
19 May 2023
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
Xiaoman Zhang
Chaoyi Wu
Ziheng Zhao
Weixiong Lin
Ya Zhang
Yanfeng Wang
Weidi Xie
LM&MA
109
173
0
17 May 2023
SurgicalGPT: End-to-End Language-Vision GPT for Visual Question
  Answering in Surgery
SurgicalGPT: End-to-End Language-Vision GPT for Visual Question Answering in Surgery
Lalithkumar Seenivasan
Mobarakol Islam
Gokul Kannan
Hongliang Ren
63
43
0
19 Apr 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
529
4,740
0
17 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
320
3,386
0
14 Apr 2023
Sigmoid Loss for Language Image Pre-Training
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
184
1,150
0
27 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
424
4,550
0
30 Jan 2023
Towards Holistic Surgical Scene Understanding
Towards Holistic Surgical Scene Understanding
Natalia Valderrama
Paola Ruiz Puentes
Isabela Hernández
Nicolás Ayobi
Mathilde Verlyck
J. Santander
J. Caicedo
Nicolás Fernández
Pablo Arbelaez
53
35
0
08 Dec 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
265
1,245
0
20 Sep 2022
Surgical-VQA: Visual Question Answering in Surgical Scenes using
  Transformer
Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer
Lalithkumar Seenivasan
Mobarakol Islam
Adithya K. Krishna
Hongliang Ren
MedIm
45
48
0
22 Jun 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
382
3,535
0
29 Apr 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
530
4,343
0
28 Jan 2022
A multi-centre polyp detection and segmentation dataset for
  generalisability assessment
A multi-centre polyp detection and segmentation dataset for generalisability assessment
Sharib Ali
Debesh Jha
N. Ghatwary
S. Realdon
R. Cannizzaro
...
Andreas Petlund
Pål Halvorsen
J. Rittscher
Thomas de Lange
J. East
55
87
0
08 Jun 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
923
29,372
0
26 Feb 2021
Kvasir-Instrument: Diagnostic and therapeutic tool segmentation dataset
  in gastrointestinal endoscopy
Kvasir-Instrument: Diagnostic and therapeutic tool segmentation dataset in gastrointestinal endoscopy
Debesh Jha
Sharib Ali
Krister Emanuelsen
Steven A. Hicks
VajiraThambawita
...
Thomas de Lange
P. Schmidt
H. Johansen
Dag Johansen
Pål Halvorsen
46
113
0
23 Oct 2020
Endoscopy disease detection challenge 2020
Endoscopy disease detection challenge 2020
Sharib Ali
N. Ghatwary
B. Braden
Dominique Lamarque
A. Bailey
S. Realdon
R. Cannizzaro
J. Rittscher
Christian Daul
J. East
101
28
0
07 Mar 2020
Kvasir-SEG: A Segmented Polyp Dataset
Kvasir-SEG: A Segmented Polyp Dataset
Debesh Jha
P. Smedsrud
Michael A. Riegler
Pål Halvorsen
Thomas de Lange
Dag Johansen
Haavard D. Johansen
186
1,169
0
16 Nov 2019
A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images
A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images
David Vázquez
Jorge Bernal
F. Sánchez
Gloria Fernández-Esparrach
Antonio M. López
Adriana Romero
M. Drozdzal
Aaron Courville
3DV
178
638
0
02 Dec 2016
EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic
  Videos
EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos
A. P. Twinanda
S. Shehata
Didier Mutter
J. Marescaux
M. de Mathelin
N. Padoy
238
862
0
09 Feb 2016
1