ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.10948
  4. Cited By
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
v1v2v3 (latest)

Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery

22 March 2024
Guan-Feng Wang
Long Bai
Wan Jun Nah
Jie Wang
Zhaoxi Zhang
Zhen Chen
Jinlin Wu
Mobarakol Islam
Hongbin Liu
Hongliang Ren
ArXiv (abs)PDFHTML

Papers citing "Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery"

50 / 51 papers shown
Title
A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis
A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis
Shengyuan Liu
Boyun Zheng
Wenting Chen
Zhihao Peng
Zhenfei Yin
Jing Shao
Jiancong Hu
Yixuan Yuan
ELM
66
0
0
29 May 2025
A Stereotype Content Analysis on Color-related Social Bias in Large Vision Language Models
A Stereotype Content Analysis on Color-related Social Bias in Large Vision Language Models
Junhyuk Choi
Minju Kim
Yeseon Hong
Bugeun Kim
46
0
0
27 May 2025
Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language Models
Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language Models
Nanxing Hu
Xiaoyue Duan
Jinchao Zhang
Guoliang Kang
MLLM
40
0
0
26 May 2025
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Ranjan Sapkota
Yang Cao
Konstantinos I. Roumeliotis
Manoj Karkee
LM&Ro
388
2
0
07 May 2025
Multimodal Graph Representation Learning for Robust Surgical Workflow Recognition with Adversarial Feature Disentanglement
Multimodal Graph Representation Learning for Robust Surgical Workflow Recognition with Adversarial Feature Disentanglement
Long Bai
Boyi Ma
Ruohan Wang
Guankun Wang
Beilei Cui
...
Mobarakol Islam
Zhe Min
Jiewen Lai
Nassir Navab
Hongliang Ren
101
0
0
03 May 2025
Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding
Wei Suo
Lijun Zhang
Mengyang Sun
Lin Yuanbo Wu
Peng Wang
Yize Zhang
MLLMVLM
93
3
0
01 Mar 2025
Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review
Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review
Ufaq Khan
Umair Nawaz
A. Qayyum
Shazad Ashraf
Muhammad Bilal
Junaid Qadir
138
0
0
24 Feb 2025
VMamba: Visual State Space Model
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
299
699
0
31 Dec 2024
SurgSora: Object-Aware Diffusion Model for Controllable Surgical Video Generation
SurgSora: Object-Aware Diffusion Model for Controllable Surgical Video Generation
Tong Chen
Shuya Yang
Junyi Wang
Long Bai
Hongliang Ren
Luping Zhou
VGenMedIm
149
4
0
18 Dec 2024
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic
  Surgical Video-Language Pretraining
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
Ming Hu
Kun Yuan
Yaling Shen
Feilong Tang
Xiaohao Xu
...
Jin Ye
N. Padoy
Nassir Navab
Junjun He
Zongyuan Ge
VLMCLIP
170
12
0
23 Nov 2024
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object
  Hallucination in Large Vision-Language Models
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
Junzhe Chen
Tianshu Zhang
Shijie Huang
Yuwei Niu
Linfeng Zhang
Lijie Wen
Xuming Hu
MLLMVLM
462
6
0
22 Nov 2024
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large
  Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Guankun Wang
Han Xiao
Huxin Gao
Renrui Zhang
Long Bai
Xiaoxiao Yang
Zhen Li
Hongsheng Li
Hongliang Ren
79
7
0
10 Oct 2024
Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust
  Visual Question-Localized Answering in Robotic Surgery
Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery
Long Bai
Guankun Wang
Mobarakol Islam
Lalithkumar Seenivasan
An-Chi Wang
Hongliang Ren
74
15
0
09 Aug 2024
GP-VLS: A general-purpose vision language model for surgery
GP-VLS: A general-purpose vision language model for surgery
Samuel Schmidgall
Joseph Cho
C. Zakka
W. Hiesinger
LM&MA
97
6
0
27 Jul 2024
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance
  in Insurance
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance
Chenwei Lin
Hanjia Lyu
Xian Xu
Jiebo Luo
60
2
0
13 Jun 2024
VM-UNet: Vision Mamba UNet for Medical Image Segmentation
VM-UNet: Vision Mamba UNet for Medical Image Segmentation
Jiacheng Ruan
Suncheng Xiang
Mamba
141
281
0
04 Feb 2024
Advancing Surgical VQA with Scene Graph Knowledge
Advancing Surgical VQA with Scene Graph Knowledge
Kun Yuan
Manasi Kattel
Joël L. Lavanchy
Nassir Navab
V. Srivastav
N. Padoy
93
21
0
15 Dec 2023
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang
Hongyang Li
Feng Li
Tianhe Ren
Xueyan Zou
...
Shijia Huang
Jianfeng Gao
Lei Zhang
Chun-yue Li
Jianwei Yang
163
75
0
05 Dec 2023
CogVLM: Visual Expert for Pretrained Language Models
CogVLM: Visual Expert for Pretrained Language Models
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLMMLLM
120
509
0
06 Nov 2023
MiniGPT-v2: large language model as a unified interface for
  vision-language multi-task learning
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
241
471
0
14 Oct 2023
Ferret: Refer and Ground Anything Anywhere at Any Granularity
Ferret: Refer and Ground Anything Anywhere at Any Granularity
Haoxuan You
Haotian Zhang
Zhe Gan
Xianzhi Du
Bowen Zhang
Zirui Wang
Liangliang Cao
Shih-Fu Chang
Yinfei Yang
ObjDMLLMVLM
113
328
0
11 Oct 2023
Improved Baselines with Visual Instruction Tuning
Improved Baselines with Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLMMLLM
160
2,817
0
05 Oct 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding,
  Localization, Text Reading, and Beyond
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLMVLMObjD
151
932
0
24 Aug 2023
Revisiting Distillation for Continual Learning on Visual Question
  Localized-Answering in Robotic Surgery
Revisiting Distillation for Continual Learning on Visual Question Localized-Answering in Robotic Surgery
Long Bai
Mobarakol Islam
Hongliang Ren
58
19
0
22 Jul 2023
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Yang Zhao
Zhijie Lin
Daquan Zhou
Zilong Huang
Jiashi Feng
Bingyi Kang
MLLM
73
111
0
17 Jul 2023
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual
  Question Localized-Answering in Robotic Surgery
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery
Long Bai
Mobarakol Islam
Hongliang Ren
58
20
0
11 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
426
4,422
0
09 Jun 2023
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
Yue Liu
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Fanyi Pu
Jingkang Yang
Cuiping Li
Ziwei Liu
MLLMVLM
90
240
0
08 Jun 2023
LLaVA-Med: Training a Large Language-and-Vision Assistant for
  Biomedicine in One Day
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
Chunyuan Li
Cliff Wong
Sheng Zhang
Naoto Usuyama
Haotian Liu
Jianwei Yang
Tristan Naumann
Hoifung Poon
Jianfeng Gao
LM&MAMedIm
123
794
0
01 Jun 2023
Surgical-VQLA: Transformer with Gated Vision-Language Embedding for
  Visual Question Localized-Answering in Robotic Surgery
Surgical-VQLA: Transformer with Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery
Long Bai
Mobarakol Islam
Lalithkumar Seenivasan
Hongliang Ren
62
31
0
19 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with
  Instruction Tuning
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLMVLM
139
2,095
0
11 May 2023
SurgicalGPT: End-to-End Language-Vision GPT for Visual Question
  Answering in Surgery
SurgicalGPT: End-to-End Language-Vision GPT for Visual Question Answering in Surgery
Lalithkumar Seenivasan
Mobarakol Islam
Gokul Kannan
Hongliang Ren
68
43
0
19 Apr 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
569
4,910
0
17 Apr 2023
Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder
Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder
Yunyi Liu
Zhanyu Wang
Dong Xu
Luping Zhou
ViTMedIm
57
37
0
04 Apr 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
1.5K
14,748
0
15 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
1.5K
13,437
0
27 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
429
4,642
0
30 Jan 2023
Surgical-VQA: Visual Question Answering in Surgical Scenes using
  Transformer
Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer
Lalithkumar Seenivasan
Mobarakol Islam
Adithya K. Krishna
Hongliang Ren
MedIm
61
48
0
22 Jun 2022
Focal Modulation Networks
Focal Modulation Networks
Jianwei Yang
Chunyuan Li
Xiyang Dai
Lu Yuan
Jianfeng Gao
3DPC
96
276
0
22 Mar 2022
Global-Reasoned Multi-Task Learning Model for Surgical Scene
  Understanding
Global-Reasoned Multi-Task Learning Model for Surgical Scene Understanding
Lalithkumar Seenivasan
Sai Mitheran
Mobarakol Islam
Hongliang Ren
67
35
0
28 Jan 2022
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRLAI4TSAI4CEALMAIMat
490
10,496
0
17 Jun 2021
Training data-efficient image transformers & distillation through
  attention
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
389
6,805
0
23 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
673
41,430
0
22 Oct 2020
Learning and Reasoning with the Graph Structure Representation in
  Robotic Surgery
Learning and Reasoning with the Graph Structure Representation in Robotic Surgery
Mobarakol Islam
Seenivasan Lalithkumar
Lim Chwee Ming
Hongliang Ren
62
41
0
07 Jul 2020
2018 Robotic Scene Segmentation Challenge
2018 Robotic Scene Segmentation Challenge
M. Allan
S. Kondo
S. Bodenstedt
S. Leger
Rahim Kadkhodamohammadi
...
Sang Hyun Park
M. Azizian
Danail Stoyanov
Lena Maier-Hein
Stefanie Speidel
70
136
0
30 Jan 2020
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
153
1,965
0
09 Aug 2019
Generalized Intersection over Union: A Metric and A Loss for Bounding
  Box Regression
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
S. Hamid Rezatofighi
Deyuan Li
JunYoung Gwak
Amir Sadeghian
Ian Reid
Silvio Savarese
154
4,180
0
25 Feb 2019
2017 Robotic Instrument Segmentation Challenge
2017 Robotic Instrument Segmentation Challenge
M. Allan
Alexey A. Shvets
T. Kurmann
Zichen Zhang
Rahul Duggal
...
Jian Yang
Danail Stoyanov
Lena Maier-Hein
Stefanie Speidel
M. Azizian
89
230
0
18 Feb 2019
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and
  Visual Relationship Detection
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection
H. Ben-younes
Rémi Cadène
Nicolas Thome
Matthieu Cord
57
218
0
31 Jan 2019
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
H. Ben-younes
Rémi Cadène
Matthieu Cord
Nicolas Thome
168
583
0
18 May 2017
12
Next