ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.05182
  4. Cited By
CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual
  Question Localized-Answering in Robotic Surgery
v1v2v3 (latest)

CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

11 July 2023
Long Bai
Mobarakol Islam
Hongliang Ren
ArXiv (abs)PDFHTML

Papers citing "CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery"

26 / 26 papers shown
Title
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
Guan-Feng Wang
Long Bai
Wan Jun Nah
Jie Wang
Zhaoxi Zhang
Zhen Chen
Jinlin Wu
Mobarakol Islam
Hongbin Liu
Hongliang Ren
120
17
0
22 Mar 2024
Surgical-VQLA: Transformer with Gated Vision-Language Embedding for
  Visual Question Localized-Answering in Robotic Surgery
Surgical-VQLA: Transformer with Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery
Long Bai
Mobarakol Islam
Lalithkumar Seenivasan
Hongliang Ren
73
32
0
19 May 2023
Two-stage Contextual Transformer-based Convolutional Neural Network for
  Airway Extraction from CT Images
Two-stage Contextual Transformer-based Convolutional Neural Network for Airway Extraction from CT Images
Yanan Wu
Shuiqing Zhao
Shouliang Qi
Jie Feng
H. Pang
...
Long Bai
Meng-Yi Li
Shuyue Xia
W. Qian
Hongliang Ren
ViTMedIm
71
25
0
15 Dec 2022
Learning Robust Representation for Joint Grading of Ophthalmic Diseases
  via Adaptive Curriculum and Feature Disentanglement
Learning Robust Representation for Joint Grading of Ophthalmic Diseases via Adaptive Curriculum and Feature Disentanglement
Haoxuan Che
Haibo Jin
Haoxing Chen
OOD
98
23
0
09 Jul 2022
Surgical-VQA: Visual Question Answering in Surgical Scenes using
  Transformer
Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer
Lalithkumar Seenivasan
Mobarakol Islam
Adithya K. Krishna
Hongliang Ren
MedIm
72
48
0
22 Jun 2022
Multimodal Multi-Head Convolutional Attention with Various Kernel Sizes
  for Medical Image Super-Resolution
Multimodal Multi-Head Convolutional Attention with Various Kernel Sizes for Medical Image Super-Resolution
Mariana-Iuliana Georgescu
Radu Tudor Ionescu
A. Miron
O. Savencu
Nicolae-Cătălin Ristea
N. Verga
Fahad Shahbaz Khan
SupR
49
52
0
08 Apr 2022
FindIt: Generalized Localization with Natural Language Queries
FindIt: Generalized Localization with Natural Language Queries
Weicheng Kuo
Fred Bertsch
Wei Li
A. Piergiovanni
M. Saffar
A. Angelova
ObjD
62
17
0
31 Mar 2022
A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional
  Emotion Recognition
A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
R Gnana Praveen
W. Melo
Nasib Ullah
Haseeb Aslam
Osama Zeeshan
...
M. Pedersoli
Alessandro Lameiras Koerich
Simon L Bacon
P. Cardinal
Eric Granger
89
71
0
28 Mar 2022
Global-Reasoned Multi-Task Learning Model for Surgical Scene
  Understanding
Global-Reasoned Multi-Task Learning Model for Surgical Scene Understanding
Lalithkumar Seenivasan
Sai Mitheran
Mobarakol Islam
Hongliang Ren
78
35
0
28 Jan 2022
Training data-efficient image transformers & distillation through
  attention
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
389
6,813
0
23 Dec 2020
Learning and Reasoning with the Graph Structure Representation in
  Robotic Surgery
Learning and Reasoning with the Graph Structure Representation in Robotic Surgery
Mobarakol Islam
Seenivasan Lalithkumar
Lim Chwee Ming
Hongliang Ren
67
41
0
07 Jul 2020
End-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT3DVPINN
456
13,130
0
26 May 2020
2018 Robotic Scene Segmentation Challenge
2018 Robotic Scene Segmentation Challenge
M. Allan
S. Kondo
S. Bodenstedt
S. Leger
Rahim Kadkhodamohammadi
...
Sang Hyun Park
M. Azizian
Danail Stoyanov
Lena Maier-Hein
Stefanie Speidel
80
136
0
30 Jan 2020
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
155
1,967
0
09 Aug 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
91
808
0
25 Jun 2019
Benchmarking Neural Network Robustness to Common Corruptions and
  Perturbations
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
Dan Hendrycks
Thomas G. Dietterich
OODVLM
198
3,458
0
28 Mar 2019
Generalized Intersection over Union: A Metric and A Loss for Bounding
  Box Regression
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
S. Hamid Rezatofighi
Deyuan Li
JunYoung Gwak
Amir Sadeghian
Ian Reid
Silvio Savarese
154
4,186
0
25 Feb 2019
2017 Robotic Instrument Segmentation Challenge
2017 Robotic Instrument Segmentation Challenge
M. Allan
Alexey A. Shvets
T. Kurmann
Zichen Zhang
Rahul Duggal
...
Jian Yang
Danail Stoyanov
Lena Maier-Hein
Stefanie Speidel
M. Azizian
89
230
0
18 Feb 2019
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and
  Visual Relationship Detection
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection
H. Ben-younes
Rémi Cadène
Nicolas Thome
Matthieu Cord
57
218
0
31 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,324
0
11 Oct 2018
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
811
132,725
0
12 Jun 2017
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
H. Ben-younes
Rémi Cadène
Matthieu Cord
Nicolas Thome
171
583
0
18 May 2017
Gated Multimodal Units for Information Fusion
Gated Multimodal Units for Information Fusion
John Arevalo
Thamar Solorio
Manuel Montes-y-Gómez
Fabio Gonzalez
95
382
0
07 Feb 2017
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.3K
194,641
0
10 Dec 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
  Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMatObjD
540
62,477
0
04 Jun 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.1K
150,433
0
22 Dec 2014
1