ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.14886
  4. Cited By
Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review

Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review

24 February 2025
Ufaq Khan
Umair Nawaz
A. Qayyum
Shazad Ashraf
Muhammad Bilal
Junaid Qadir
ArXivPDFHTML

Papers citing "Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review"

50 / 142 papers shown
Title
A Comprehensive Survey of Foundation Models in Medicine
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
151
21
0
17 Jan 2025
Survey of different Large Language Model Architectures: Trends,
  Benchmarks, and Challenges
Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges
Minghao Shao
Abdul Basit
Ramesh Karri
Muhammad Shafique
88
14
0
04 Dec 2024
Medical Multimodal Foundation Models in Clinical Diagnosis and
  Treatment: Applications, Challenges, and Future Directions
Medical Multimodal Foundation Models in Clinical Diagnosis and Treatment: Applications, Challenges, and Future Directions
Kai Sun
Siyan Xue
F. Sun
Haoran Sun
Yu-Juan Luo
...
Xinzhou Wang
Lei Yang
Shuo Jin
Jun Yan
Jiahong Dong
AI4CE
95
2
0
03 Dec 2024
Surgical-LLaVA: Toward Surgical Scenario Understanding via Large
  Language and Vision Models
Surgical-LLaVA: Toward Surgical Scenario Understanding via Large Language and Vision Models
Juseong Jin
Chang Wook Jeong
43
3
0
13 Oct 2024
SURGIVID: Annotation-Efficient Surgical Video Object Discovery
SURGIVID: Annotation-Efficient Surgical Video Object Discovery
Çağhan Köksal
Ghazal Ghazaei
Nassir Navab
37
1
0
12 Sep 2024
SurGen: Text-Guided Diffusion Model for Surgical Video Generation
SurGen: Text-Guided Diffusion Model for Surgical Video Generation
Joseph Cho
Samuel Schmidgall
C. Zakka
Mrudang Mathur
Dhamanpreet Kaur
R. Shad
W. Hiesinger
VGen
MedIm
49
7
0
26 Aug 2024
LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured
  Surgical Video Learning
LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning
Jiajie Li
Garrett C Skinner
Gene Yang
Brian R Quaranto
Steven D. Schwaitzberg
Peter C W Kim
Jinjun Xiong
63
10
0
15 Aug 2024
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning
Haofeng Liu
Erli Zhang
Junde Wu
Mingxuan Hong
Yueming Jin
MedIm
60
16
0
15 Aug 2024
Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust
  Visual Question-Localized Answering in Robotic Surgery
Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery
Long Bai
Guankun Wang
Mobarakol Islam
Lalithkumar Seenivasan
An-Chi Wang
Hongliang Ren
61
14
0
09 Aug 2024
Segment Anything in Medical Images and Videos: Benchmark and Deployment
Segment Anything in Medical Images and Videos: Benchmark and Deployment
Jun Ma
Sumin Kim
Feifei Li
Mohammed Baharoon
Reza Asakereh
Hongwei Lyu
Bo Wang
VLM
MedIm
54
35
0
06 Aug 2024
SAM 2: Segment Anything in Images and Videos
SAM 2: Segment Anything in Images and Videos
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
...
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
VLM
MLLM
58
796
0
01 Aug 2024
ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon
  Intention Understanding
ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding
Zhen Chen
Zongmin Zhang
Wenwu Guo
Xingjian Luo
Long Bai
Jinlin Wu
Hongliang Ren
Hongbin Liu
46
5
0
28 Jul 2024
GP-VLS: A general-purpose vision language model for surgery
GP-VLS: A general-purpose vision language model for surgery
Samuel Schmidgall
Joseph Cho
C. Zakka
W. Hiesinger
LM&MA
77
6
0
27 Jul 2024
CycleSAM: One-Shot Surgical Scene Segmentation using Cycle-Consistent
  Feature Matching to Prompt SAM
CycleSAM: One-Shot Surgical Scene Segmentation using Cycle-Consistent Feature Matching to Prompt SAM
Aditya Murali
Pietro Mascagni
Didier Mutter
N. Padoy
VLM
MedIm
77
3
0
09 Jul 2024
Foundational Models for Pathology and Endoscopy Images: Application for
  Gastric Inflammation
Foundational Models for Pathology and Endoscopy Images: Application for Gastric Inflammation
H. Kerdegari
Kyle Higgins
Dennis Veselkov
I. Laponogov
I. Poļaka
...
Junior Andrea Pescino
M. Leja
M. Dinis-Ribeiro
T. F. Kanonnikoff
Kirill Veselkov
68
3
0
26 Jun 2024
EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from
  Egocentric Open Surgery Videos
EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery Videos
Ryo Fujii
Hideo Saito
Hiroki Kajita
47
5
0
05 Jun 2024
LlamaCare: A Large Medical Language Model for Enhancing Healthcare
  Knowledge Sharing
LlamaCare: A Large Medical Language Model for Enhancing Healthcare Knowledge Sharing
Maojun Sun
37
3
0
04 Jun 2024
Surgical-DeSAM: Decoupling SAM for Instrument Segmentation in Robotic
  Surgery
Surgical-DeSAM: Decoupling SAM for Instrument Segmentation in Robotic Surgery
Yuyang Sheng
Sophia Bano
Matthew J. Clarkson
Mobarakol Islam
54
7
0
22 Apr 2024
Strategies to Improve Real-World Applicability of Laparoscopic Anatomy
  Segmentation Models
Strategies to Improve Real-World Applicability of Laparoscopic Anatomy Segmentation Models
Fiona R. Kolbinger
Jiangpeng He
Jinge Ma
Fengqing Zhu
33
2
0
25 Mar 2024
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
Guan-Feng Wang
Long Bai
Wan Jun Nah
Jie Wang
Zhaoxi Zhang
Zhen Chen
Jinlin Wu
Mobarakol Islam
Hongbin Liu
Hongliang Ren
57
17
0
22 Mar 2024
Endora: Video Generation Models as Endoscopy Simulators
Endora: Video Generation Models as Endoscopy Simulators
Chenxin Li
Hengyu Liu
Yifan Liu
Brandon Yushan Feng
Wuyang Li
Xinyu Liu
Zhen Chen
Jing Shao
Yixuan Yuan
VGen
MedIm
93
35
0
17 Mar 2024
From Generalization to Precision: Exploring SAM for Tool Segmentation in
  Surgical Environments
From Generalization to Precision: Exploring SAM for Tool Segmentation in Surgical Environments
K. J. Oguine
R. Soberanis-Mukul
Nathan G. Drenkow
Mathias Unberath
MedIm
52
7
0
28 Feb 2024
Surgment: Segmentation-enabled Semantic Search and Creation of Visual
  Question and Feedback to Support Video-Based Surgery Learning
Surgment: Segmentation-enabled Semantic Search and Creation of Visual Question and Feedback to Support Video-Based Surgery Learning
Jingying Wang
Haoran Tang
Taylor Kantor
Tandis Soltani
Vitaliy Popov
Xu Wang
MedIm
40
7
0
27 Feb 2024
Pixel-Wise Recognition for Holistic Surgical Scene Understanding
Pixel-Wise Recognition for Holistic Surgical Scene Understanding
Nicolás Ayobi
Santiago Rodríguez
Alejandra Pérez
Isabela Hernández
Nicolás Aparicio
...
Sebastián Pena
J. Santander
J. Caicedo
Nicolás Fernández
Pablo Arbelaez
ViT
MedIm
45
9
0
20 Jan 2024
Foundation Models for Biomedical Image Segmentation: A Survey
Foundation Models for Biomedical Image Segmentation: A Survey
Ho Hin Lee
Yu Gu
Theodore Zhao
Yanbo Xu
Jianwei Yang
...
Mu-Hsin Wei
Bennett A. Landman
Yuankai Huo
Alberto Santamaría-Pang
Hoifung Poon
MedIm
VLM
55
16
0
15 Jan 2024
Surgical-DINO: Adapter Learning of Foundation Models for Depth
  Estimation in Endoscopic Surgery
Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery
Beilei Cui
Mobarakol Islam
Long Bai
Hongliang Ren
MedIm
59
38
0
11 Jan 2024
Large Model based Sequential Keyframe Extraction for Video Summarization
Large Model based Sequential Keyframe Extraction for Video Summarization
Kailong Tan
Yuxiang Zhou
Qianchen Xia
Rui Liu
Yong Chen
33
6
0
10 Jan 2024
RudolfV: A Foundation Model by Pathologists for Pathologists
RudolfV: A Foundation Model by Pathologists for Pathologists
Jonas Dippel
Barbara Feulner
Tobias Winterhoff
Timo Milbich
Stephan Tietz
...
David Horst
Lukas Ruff
Klaus-Robert Muller
Frederick Klauschen
Maximilian Alber
46
30
0
08 Jan 2024
Segment Anything Model for Medical Image Segmentation: Current
  Applications and Future Directions
Segment Anything Model for Medical Image Segmentation: Current Applications and Future Directions
Yichi Zhang
Zhenrong Shen
Rushi Jiao
VLM
MedIm
51
120
0
07 Jan 2024
SurgicalPart-SAM: Part-to-Whole Collaborative Prompting for Surgical
  Instrument Segmentation
SurgicalPart-SAM: Part-to-Whole Collaborative Prompting for Surgical Instrument Segmentation
Wenxi Yue
Jing Zhang
Kun Hu
Qiuxia Wu
Zongyuan Ge
Yong Xia
Jiebo Luo
Zhiyong Wang
44
3
0
22 Dec 2023
Advancing Surgical VQA with Scene Graph Knowledge
Advancing Surgical VQA with Scene Graph Knowledge
Kun Yuan
Manasi Kattel
Joël L. Lavanchy
Nassir Navab
V. Srivastav
N. Padoy
64
18
0
15 Dec 2023
Large language models in healthcare and medical domain: A review
Large language models in healthcare and medical domain: A review
Zabir Al Nazi
Wei Peng
LM&MA
49
141
0
12 Dec 2023
CLIP in Medical Imaging: A Comprehensive Survey
CLIP in Medical Imaging: A Comprehensive Survey
Zihao Zhao
Yuxiao Liu
Han Wu
Yonghao Li
Sheng Wang
L. Teng
Disheng Liu
Zhiming Cui
Qian Wang
Dinggang Shen
CLIP
MedIm
LM&MA
VLM
41
43
0
12 Dec 2023
Foundational Models in Medical Imaging: A Comprehensive Survey and
  Future Vision
Foundational Models in Medical Imaging: A Comprehensive Survey and Future Vision
Bobby Azad
Reza Azad
Sania Eskandari
Afshin Bozorgpour
Amirhossein Kazerouni
I. Rekik
Dorit Merhof
VLM
MedIm
114
62
0
28 Oct 2023
Tracking and Mapping in Medical Computer Vision: A Review
Tracking and Mapping in Medical Computer Vision: A Review
Adam Schmidt
Omid Mohareri
S. DiMaio
Michael C. Yip
Septimiu E. Salcudean
62
34
0
17 Oct 2023
VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for
  Generalist Ophthalmic Artificial Intelligence
VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence
Jianing Qiu
Jian Wu
Hao Wei
Peilun Shi
Minqing Zhang
...
Benny Lo
Yih-Chung Tham
T. Y. Wong
Ningli Wang
Wu Yuan
MedIm
VLM
LM&MA
29
21
0
08 Oct 2023
MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image
  Segmentation
MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image Segmentation
Cheng Chen
Juzheng Miao
Dufan Wu
Zhiling Yan
Sekeun Kim
...
Lichao Sun
Xiang Li
Tianming Liu
Pheng-Ann Heng
Quanzheng Li
MedIm
76
58
0
16 Sep 2023
SAM3D: Segment Anything Model in Volumetric Medical Images
SAM3D: Segment Anything Model in Volumetric Medical Images
Nhat-Tan Bui
Dinh-Hieu Hoang
Minh-Triet Tran
Gianfranco Doretto
Donald Adjeroh
Brijesh Patel
Arabinda Choudhary
Ngan Le
MedIm
47
40
0
07 Sep 2023
SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation
SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation
Wenxi Yue
Jing Zhang
Kun Hu
Yong-quan Xia
Jiebo Luo
Zhiyong Wang
VLM
MedIm
45
65
0
17 Aug 2023
SAM Meets Robotic Surgery: An Empirical Study on Generalization,
  Robustness and Adaptation
SAM Meets Robotic Surgery: An Empirical Study on Generalization, Robustness and Adaptation
An-Chi Wang
Mobarakol Islam
Mengya Xu
Yang Zhang
Hongliang Ren
AAML
33
27
0
14 Aug 2023
Polyp-SAM++: Can A Text Guided SAM Perform Better for Polyp
  Segmentation?
Polyp-SAM++: Can A Text Guided SAM Perform Better for Polyp Segmentation?
Risab Biswas
MedIm
55
22
0
12 Aug 2023
AdaptiveSAM: Towards Efficient Tuning of SAM for Surgical Scene
  Segmentation
AdaptiveSAM: Towards Efficient Tuning of SAM for Surgical Scene Segmentation
Jay N. Paranjape
Nithin Gopalakrishnan Nair
S. Sikder
S. Vedula
Vishal M. Patel
MedIm
33
40
0
07 Aug 2023
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual
  Question Localized-Answering in Robotic Surgery
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery
Long Bai
Mobarakol Islam
Hongliang Ren
45
20
0
11 Jul 2023
Pruning vs Quantization: Which is Better?
Pruning vs Quantization: Which is Better?
Andrey Kuzmin
Markus Nagel
M. V. Baalen
Arash Behboodi
Tijmen Blankevoort
MQ
62
48
0
06 Jul 2023
Foundation Model for Endoscopy Video Analysis via Large-scale
  Self-supervised Pre-train
Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train
Zhao Wang
Chang Liu
Shaoting Zhang
Qi Dou
MedIm
60
61
0
29 Jun 2023
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
  Imaging via Second-order Graph Matching
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching
D. M. Nguyen
Hoang Nguyen
Nghiem Tuong Diep
Tan Ngoc Pham
T. Cao
...
Nhat Ho
Shadi Albarqouni
P. Xie
Daniel Sonntag
Mathias Niepert
VLM
MedIm
41
50
0
20 Jun 2023
A spatio-temporal network for video semantic segmentation in surgical
  videos
A spatio-temporal network for video semantic segmentation in surgical videos
M. Grammatikopoulou
Ricardo Sánchez-Matilla
Felix J. S. Bragman
David Owen
Lucy H Culshaw
K. Kerr
Danail Stoyanov
Imanol Luengo
60
16
0
19 Jun 2023
On the Challenges and Perspectives of Foundation Models for Medical
  Image Analysis
On the Challenges and Perspectives of Foundation Models for Medical Image Analysis
Shaoting Zhang
Dimitris N. Metaxas
LM&MA
VLM
MedIm
AI4CE
68
135
0
09 Jun 2023
Surgical-VQLA: Transformer with Gated Vision-Language Embedding for
  Visual Question Localized-Answering in Robotic Surgery
Surgical-VQLA: Transformer with Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery
Long Bai
Mobarakol Islam
Lalithkumar Seenivasan
Hongliang Ren
36
27
0
19 May 2023
LEO: Generative Latent Image Animator for Human Video Synthesis
LEO: Generative Latent Image Animator for Human Video Synthesis
Yaohui Wang
Xin Ma
Xinyuan Chen
A. Dantcheva
Bo Dai
Yu Qiao
DiffM
103
31
0
06 May 2023
123
Next