ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.15933
  4. Cited By
Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion
  Approach for 3D VQA

Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA

24 February 2024
Wentao Mo
Yang Liu
ArXiv (abs)PDFHTMLGithub (21★)

Papers citing "Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA"

9 / 9 papers shown
Title
Multi-CLIP: Contrastive Vision-Language Pre-training for Question
  Answering tasks in 3D Scenes
Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes
Alexandros Delitzas
Maria Parelli
Nikolas Hars
G. Vlassis
Sotiris Anagnostidis
Gregor Bachmann
Thomas Hofmann
CLIP
43
20
0
04 Jun 2023
SQA3D: Situated Question Answering in 3D Scenes
SQA3D: Situated Question Answering in 3D Scenes
Xiaojian Ma
Silong Yong
Zilong Zheng
Qing Li
Yitao Liang
Song-Chun Zhu
Siyuan Huang
LM&Ro
72
158
0
14 Oct 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLMBDLVLMCLIP
555
4,409
0
28 Jan 2022
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLMMLLM
136
799
0
24 Aug 2021
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
132
1,944
0
13 Apr 2020
VQA-LOL: Visual Question Answering under the Lens of Logic
VQA-LOL: Visual Question Answering under the Lens of Logic
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
CoGe
55
75
0
19 Feb 2020
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLMMLLM
250
2,488
0
20 Aug 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
87
808
0
25 Jun 2019
Deep Hough Voting for 3D Object Detection in Point Clouds
Deep Hough Voting for 3D Object Detection in Point Clouds
C. Qi
Or Litany
Kaiming He
Leonidas Guibas
3DPC
108
1,290
0
21 Apr 2019
1