Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.12028
Cited By
Towards Explainable 3D Grounded Visual Question Answering: A New Benchmark and Strong Baseline
24 September 2022
Lichen Zhao
Daigang Cai
Jing Zhang
Lu Sheng
Dong Xu
Ruizhi Zheng
Yinjie Zhao
Lipeng Wang
Xibo Fan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards Explainable 3D Grounded Visual Question Answering: A New Benchmark and Strong Baseline"
15 / 15 papers shown
Title
3D Question Answering for City Scene Understanding
Penglei Sun
Yaoxian Song
Xiang Liu
Xiaofei Yang
Qiang-qiang Wang
Tiefeng Li
Yang Yang
Xiaowen Chu
23
1
0
24 Jul 2024
SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering
Zhe Yang
Wenrui Li
Guanghui Cheng
Mamba
28
0
0
14 Jun 2024
Unifying 3D Vision-Language Understanding via Promptable Queries
Ziyu Zhu
Zhuofan Zhang
Xiaojian Ma
Xuesong Niu
Yixin Chen
Baoxiong Jia
Zhidong Deng
Siyuan Huang
Qing Li
48
21
0
19 May 2024
Think-Program-reCtify: 3D Situated Reasoning with Large Language Models
Qingrong He
Kejun Lin
Shizhe Chen
Anwen Hu
Qin Jin
LRM
50
1
0
23 Apr 2024
Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA
Wentao Mo
Yang Liu
24
6
0
24 Feb 2024
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Sijin Chen
Xin Chen
C. Zhang
Mingsheng Li
Gang Yu
Hao Fei
Erik Cambria
Jiayuan Fan
Tao Chen
MLLM
29
82
0
30 Nov 2023
Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture
Yixin Chen
Junfeng Ni
Nan Jiang
Yaowei Zhang
Yixin Zhu
Siyuan Huang
3DV
30
21
0
01 Nov 2023
CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data
Taiki Miyanishi
Fumiya Kitamori
Shuhei Kurita
Jungdae Lee
M. Kawanabe
Nakamasa Inoue
AI4TS
3DPC
17
6
0
28 Oct 2023
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
Ziyu Zhu
Xiaojian Ma
Yixin Chen
Zhidong Deng
Siyuan Huang
Qing Li
LM&Ro
34
104
0
08 Aug 2023
NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario
Tianwen Qian
Jingjing Chen
Linhai Zhuo
Yang Jiao
Yueping Jiang
29
137
0
24 May 2023
Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding
Zhang Tao
Su He
D. Tao
Bin Chen
Zhi Wang
Shutao Xia
VLM
37
22
0
18 May 2023
Multi-modal Large Language Model Enhanced Pseudo 3D Perception Framework for Visual Commonsense Reasoning
Jian Zhu
Hanli Wang
Miaojing Shi
LRM
24
4
0
30 Jan 2023
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
202
405
0
13 Jul 2021
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring
Zhihao Yuan
Xu Yan
Yinghong Liao
Ruimao Zhang
Sheng Wang
Zhen Li
Shuguang Cui
71
129
0
01 Mar 2021
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
C. Qi
Hao Su
Kaichun Mo
Leonidas J. Guibas
3DH
3DPC
3DV
PINN
222
14,131
0
02 Dec 2016
1