Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.07474
Cited By
SQA3D: Situated Question Answering in 3D Scenes
14 October 2022
Xiaojian Ma
Silong Yong
Zilong Zheng
Qing Li
Yitao Liang
Song-Chun Zhu
Siyuan Huang
LM&Ro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SQA3D: Situated Question Answering in 3D Scenes"
50 / 96 papers shown
Title
SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models
Shun Taguchi
Hideki Deguchi
Takumi Hamazaki
Hiroyuki Sakai
ReLM
LRM
49
0
0
08 May 2025
RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration
Huajie Tan
Xiaoshuai Hao
Minglan Lin
Pengwei Wang
Yaoxu Lyu
Mingyu Cao
Zhongyuan Wang
S. Zhang
LM&Ro
48
0
0
06 May 2025
RAG-VR: Leveraging Retrieval-Augmented Generation for 3D Question Answering in VR Environments
Shiyi Ding
Ying Chen
RALM
42
0
0
11 Apr 2025
The Point, the Vision and the Text: Does Point Cloud Boost Spatial Reasoning of Large Language Models?
Weichen Zhang
Ruiying Peng
Chen Gao
Jianjie Fang
Xin Zeng
...
Zhilin Wang
Jinqiang Cui
Xin Wang
Xinlei Chen
Yongqian Li
LRM
75
0
0
06 Apr 2025
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
Haochen Wang
Yucheng Zhao
Tiancai Wang
Haoqiang Fan
Xinming Zhang
Zhaoxiang Zhang
66
0
0
02 Apr 2025
Empowering Large Language Models with 3D Situation Awareness
Zhihao Yuan
Yibo Peng
Jinke Ren
Yinghong Liao
Yatong Han
Chun-Mei Feng
Hengshuang Zhao
G. Li
Shuguang Cui
Zhen Li
51
0
0
29 Mar 2025
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
Jiahui Zhang
Yurui Chen
Yanpeng Zhou
Yueming Xu
Ze Huang
...
Xinyue Cai
G. Huang
Xingyue Quan
Hang Xu
Li Zhang
LRM
94
0
0
29 Mar 2025
Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments
Yifan Xu
V. Kamat
Carol Menassa
51
0
0
29 Mar 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
J. Huang
Baoxiong Jia
Yixuan Wang
Ziyu Zhu
Xiongkun Linghu
Qing Li
Song-Chun Zhu
Siyuan Huang
84
3
0
28 Mar 2025
PAVE: Patching and Adapting Video Large Language Models
Zhuoming Liu
Yiquan Li
Khoi Duc Nguyen
Yiwu Zhong
Yin Li
KELM
LRM
86
0
0
25 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Yuyao Zhang
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Z. Zhang
Yan Huang
Liang Wang
Tieniu Tan
167
2
0
18 Mar 2025
Exploring 3D Activity Reasoning and Planning: From Implicit Human Intentions to Route-Aware Planning
Xueying Jiang
Wenhao Li
Xiaoqin Zhang
Ling Shao
Shijian Lu
LRM
47
0
0
17 Mar 2025
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
Jiahe Zhao
Ruibing Hou
Zejie Tian
Hong Chang
Shiguang Shan
45
0
0
17 Mar 2025
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space
Weichen Zhan
Zile Zhou
Zhiheng Zheng
Chen Gao
Jinqiang Cui
Yongqian Li
Xinlei Chen
Xiao-Ping Zhang
LRM
63
1
0
14 Mar 2025
SplatTalk: 3D VQA with Gaussian Splatting
Anh Thai
Songyou Peng
Kyle Genova
Leonidas J. Guibas
Thomas Funkhouser
3DGS
79
0
0
08 Mar 2025
DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering
Jingzhou Luo
Yong-Jin Liu
Weixing Chen
Zhen Li
Yixuan Wang
G. Li
Liang Lin
67
2
0
05 Mar 2025
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
Hanxun Yu
Wentong Li
Song Wang
J. Chen
Jianke Zhu
3DV
LRM
83
3
0
01 Mar 2025
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
Yuheng Ji
Huajie Tan
Jiayu Shi
Xiaoshuai Hao
Yuan Zhang
...
Huaihai Lyu
Xiaolong Zheng
Jiaming Liu
Zhongyuan Wang
Shanghang Zhang
99
8
0
28 Feb 2025
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang
Haifeng Huang
Yuzhang Shang
Mubarak Shah
Yan Yan
46
7
0
21 Feb 2025
Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
Rui Zhao
Qirui Yuan
Jinyu Li
Haofeng Hu
Yun Li
Chengyuan Zheng
Fei Gao
LRM
52
4
0
19 Feb 2025
Hypo3D: Exploring Hypothetical Reasoning in 3D
Ye Mao
Weixun Luo
Junpeng Jing
Anlan Qiu
K. Mikolajczyk
67
0
0
02 Feb 2025
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Zhangyang Qi
Zhixiong Zhang
Ye Fang
Jiaqi Wang
Hengshuang Zhao
83
6
0
02 Jan 2025
NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries
Tao Wu
Chuhao Zhou
Yen Heng Wong
Lin Gu
Jianfei Yang
89
1
0
14 Dec 2024
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
96
6
0
25 Nov 2024
Mars: Situated Inductive Reasoning in an Open-World Environment
Xiaojuan Tang
Jiaqi Li
Yitao Liang
Song-chun Zhu
Muhan Zhang
Zilong Zheng
LM&Ro
LRM
LLMAG
29
1
0
10 Oct 2024
OrionNav: Online Planning for Robot Autonomy with Context-Aware LLM and Open-Vocabulary Semantic Scene Graphs
Venkata Naren Devarakonda
Raktim Gautam Goswami
Ali Umut Kaypak
Naman Patel
Rooholla Khorrambakht
Prashanth Krishnamurthy
Farshad Khorrami
LM&Ro
39
3
0
08 Oct 2024
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models
Yue Zhang
Zhiyang Xu
Ying Shen
Parisa Kordjamshidi
Lifu Huang
34
6
0
04 Oct 2024
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Chenming Zhu
Tai Wang
Wenwei Zhang
Jiangmiao Pang
Xihui Liu
134
32
0
26 Sep 2024
SYNERGAI: Perception Alignment for Human-Robot Collaboration
Yixin Chen
Guoxi Zhang
Yaowei Zhang
Hongming Xu
Peiyuan Zhi
Qing Li
Siyuan Huang
37
0
0
24 Sep 2024
QueryCAD: Grounded Question Answering for CAD Models
Claudius Kienle
Benjamin Alt
Darko Katic
Rainer Jäkel
Jan Peters
28
2
0
13 Sep 2024
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Yunze Man
Shuhong Zheng
Zhipeng Bao
M. Hebert
Liang-Yan Gui
Yu-xiong Wang
75
15
0
05 Sep 2024
Multi-modal Situated Reasoning in 3D Scenes
Xiongkun Linghu
Jiangyong Huang
Xuesong Niu
Xiaojian Ma
Baoxiong Jia
Siyuan Huang
36
11
0
04 Sep 2024
"Is This It?": Towards Ecologically Valid Benchmarks for Situated Collaboration
D. Bohus
Sean Andrist
Yuwei Bao
Eric Horvitz
Ann Paradiso
35
0
0
30 Aug 2024
Space3D-Bench: Spatial 3D Question Answering Benchmark
E. Szymańska
Mihai Dusmanu
J. Buurlage
Mahdi Rad
Marc Pollefeys
53
4
0
29 Aug 2024
R2G: Reasoning to Ground in 3D Scenes
Yixuan Li
Zan Wang
Wei Liang
41
2
0
24 Aug 2024
Open-Ended 3D Point Cloud Instance Segmentation
Phuc D. A. Nguyen
Minh Luu
Anh Tran
Cuong Pham
Khoi Nguyen
3DPC
56
1
0
21 Aug 2024
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Fanqing Meng
Jun Wang
Chuanhao Li
Quanfeng Lu
Hao Tian
...
Jifeng Dai
Yu Qiao
Ping Luo
Kaipeng Zhang
Wenqi Shao
VLM
60
18
0
05 Aug 2024
Answerability Fields: Answerable Location Estimation via Diffusion Models
Daich Azuma
Taiki Miyanishi
Shuhei Kurita
Koya Sakamoto
M. Kawanabe
DiffM
48
0
0
26 Jul 2024
3D Question Answering for City Scene Understanding
Penglei Sun
Yaoxian Song
Xiang Liu
Xiaofei Yang
Qiang-qiang Wang
Tiefeng Li
Yang Yang
Xiaowen Chu
18
1
0
24 Jul 2024
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI
Yang Liu
Weixing Chen
Yongjie Bai
Xiaodan Liang
Guanbin Li
Wen Gao
Liang Lin
LM&Ro
SyDa
AI4CE
51
50
0
09 Jul 2024
ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities
Chenming Zhu
Tai Wang
Wenwei Zhang
Kai Chen
Xihui Liu
ReLM
LRM
45
16
0
01 Jul 2024
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents
Zihao Wang
Shaofei Cai
Zhancun Mu
Haowei Lin
Ceyao Zhang
Xuejie Liu
Qing Li
Anji Liu
Xiaojian Ma
Yitao Liang
LM&Ro
46
12
0
27 Jun 2024
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
Ruiyuan Lyu
Tai Wang
Jingli Lin
Shuai Yang
Xiaohan Mao
...
Runsen Xu
Haifeng Huang
Chenming Zhu
Dahua Lin
Jiangmiao Pang
3DV
49
9
0
13 Jun 2024
Situational Awareness Matters in 3D Vision Language Reasoning
Yunze Man
Liang-Yan Gui
Yu-Xiong Wang
43
12
0
11 Jun 2024
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Jianing Yang
Xuweiyi Chen
Nikhil Madaan
Madhavan Iyengar
Shengyi Qian
David Fouhey
Joyce Chai
3DV
75
11
0
07 Jun 2024
Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models
Tianrun Chen
Chunan Yu
Jing Li
Jianqi Zhang
Lanyun Zhu
Deyi Ji
Yong Zhang
Ying Zang
Zejian Li
Lingyun Sun
LRM
49
9
0
29 May 2024
Unifying 3D Vision-Language Understanding via Promptable Queries
Ziyu Zhu
Zhuofan Zhang
Xiaojian Ma
Xuesong Niu
Yixin Chen
Baoxiong Jia
Zhidong Deng
Siyuan Huang
Qing Li
48
21
0
19 May 2024
Grounded 3D-LLM with Referent Tokens
Yilun Chen
Shuai Yang
Haifeng Huang
Tai Wang
Ruiyuan Lyu
Runsen Xu
Dahua Lin
Jiangmiao Pang
53
22
0
16 May 2024
4D Panoptic Scene Graph Generation
Jingkang Yang
Jun Cen
Wenxuan Peng
Shuai Liu
Fangzhou Hong
Xiangtai Li
Kaiyang Zhou
Qifeng Chen
Ziwei Liu
45
13
0
16 May 2024
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Xianzheng Ma
Yash Bhalgat
Brandon Smart
Shuai Chen
Xinghui Li
...
Matthias Nießner
Ian D Reid
Angel X. Chang
Iro Laina
V. Prisacariu
LRM
33
13
0
16 May 2024
1
2
Next