Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.11442
Cited By
Unifying 3D Vision-Language Understanding via Promptable Queries
19 May 2024
Ziyu Zhu
Zhuofan Zhang
Xiaojian Ma
Xuesong Niu
Yixin Chen
Baoxiong Jia
Zhidong Deng
Siyuan Huang
Qing Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Unifying 3D Vision-Language Understanding via Promptable Queries"
12 / 12 papers shown
Title
ASHiTA: Automatic Scene-grounded HIerarchical Task Analysis
Yun Chang
Leonor Fermoselle
Duy Ta
Bernadette Bucher
Luca Carlone
Jiuguang Wang
32
0
0
09 Apr 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
J. Huang
Baoxiong Jia
Y. Wang
Ziyu Zhu
Xiongkun Linghu
Qing Li
Song-Chun Zhu
Siyuan Huang
77
3
0
28 Mar 2025
ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
Yu Liu
Baoxiong Jia
Ruijie Lu
Junfeng Ni
Song-Chun Zhu
Siyuan Huang
3DGS
80
8
0
26 Feb 2025
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang
Haifeng Huang
Yuzhang Shang
Mubarak Shah
Yan Yan
46
7
0
21 Feb 2025
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Zhangyang Qi
Zhixiong Zhang
Ye Fang
Jiaqi Wang
Hengshuang Zhao
83
6
0
02 Jan 2025
Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting
Jun Guo
Xiaojian Ma
Yue Fan
Huaping Liu
Qing Li
3DGS
36
26
0
22 Mar 2024
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
267
4,229
0
30 Jan 2023
Mask3D: Mask Transformer for 3D Semantic Instance Segmentation
Jonas Schult
Francis Engelmann
Alexander Hermans
Or Litany
Siyu Tang
Bastian Leibe
ISeg
50
164
0
06 Oct 2022
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
Yanmin Wu
Xinhua Cheng
Renrui Zhang
Zesen Cheng
Jian Zhang
51
62
0
29 Sep 2022
TEACh: Task-driven Embodied Agents that Chat
Aishwarya Padmakumar
Jesse Thomason
Ayush Shrivastava
P. Lange
Anjali Narayan-Chen
Spandana Gella
Robinson Piramithu
Gökhan Tür
Dilek Z. Hakkani-Tür
LM&Ro
160
180
0
01 Oct 2021
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
253
525
0
04 Feb 2021
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
C. Qi
Hao Su
Kaichun Mo
Leonidas J. Guibas
3DH
3DPC
3DV
PINN
222
14,099
0
02 Dec 2016
1