Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.05251
Cited By
Multi3DRefer: Grounding Text Description to Multiple 3D Objects
11 September 2023
Yiming Zhang
ZeMing Gong
Angel X. Chang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multi3DRefer: Grounding Text Description to Multiple 3D Objects"
48 / 48 papers shown
Title
SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models
Shun Taguchi
Hideki Deguchi
Takumi Hamazaki
Hiroyuki Sakai
ReLM
LRM
49
0
0
08 May 2025
AS3D: 2D-Assisted Cross-Modal Understanding with Semantic-Spatial Scene Graphs for 3D Visual Grounding
Feng Xiao
Hongbin Xu
Guocan Zhao
Wenxiong Kang
53
0
0
07 May 2025
3DResT: A Strong Baseline for Semi-Supervised 3D Referring Expression Segmentation
Wenxin Chen
Mengxue Qu
Weitai Kang
Yan Yan
Yao Zhao
Yunchao Wei
46
0
0
17 Apr 2025
Multi-Object Grounding via Hierarchical Contrastive Siamese Transformers
Chengyi Du
Keyan Jin
32
0
0
14 Apr 2025
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
Pedro Hermosilla
Christian Stippel
Leon Sick
SSL
3DPC
79
0
0
09 Apr 2025
The Point, the Vision and the Text: Does Point Cloud Boost Spatial Reasoning of Large Language Models?
Weichen Zhang
Ruiying Peng
Chen Gao
Jianjie Fang
Xin Zeng
...
Zihan Wang
Jinqiang Cui
Xin Wang
Xinlei Chen
Yong Li
LRM
81
0
0
06 Apr 2025
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
Haochen Wang
Yucheng Zhao
Tiancai Wang
Haoqiang Fan
Xinming Zhang
Zhaoxiang Zhang
75
0
0
02 Apr 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
J. Huang
Baoxiong Jia
Yansen Wang
Ziyu Zhu
Xiongkun Linghu
Qing Li
Song-Chun Zhu
Siyuan Huang
87
3
0
28 Mar 2025
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
Chenyangguang Zhang
Alexandros Delitzas
Fangjinhua Wang
Ruida Zhang
Xiangyang Ji
Marc Pollefeys
Francis Engelmann
3DV
3DPC
49
4
0
24 Mar 2025
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
Jiaxin Huang
Runnan Chen
Ziwen Li
Zhengqing Gao
Xiao He
Yandong Guo
Mingming Gong
Tongliang Liu
LRM
56
0
0
23 Mar 2025
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
Chandan Yeshwanth
Dávid Rozenberszki
Angela Dai
85
0
0
21 Mar 2025
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
Jiahe Zhao
Ruibing Hou
Zejie Tian
Hong Chang
Shiguang Shan
45
0
0
17 Mar 2025
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
Hanxun Yu
Wentong Li
Song Wang
Jintai Chen
Jianke Zhu
3DV
LRM
86
3
0
01 Mar 2025
LIFT-GS: Cross-Scene Render-Supervised Distillation for 3D Language Grounding
Ang Cao
Sergio Arnaud
Oleksandr Maksymets
Jianing Yang
Ayush Jain
...
Aravind Rajeswaran
Franziska Meier
Justin Johnson
Jeong Joon Park
Alexander Sax
70
0
0
27 Feb 2025
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang
Haifeng Huang
Yuzhang Shang
Mubarak Shah
Yan Yan
46
7
0
21 Feb 2025
AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring
Xinyi Wang
Na Zhao
Zhiyuan Han
Dan Guo
Xun Yang
51
1
0
17 Jan 2025
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Zhangyang Qi
Zhixiong Zhang
Ye Fang
Jiaqi Wang
Hengshuang Zhao
88
7
0
02 Jan 2025
LidaRefer: Outdoor 3D Visual Grounding for Autonomous Driving with Transformers
Yeong-Seung Baek
Heung-Seon Oh
34
0
0
07 Nov 2024
Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention
Haomeng Zhang
Chiao-An Yang
Raymond A. Yeh
41
1
0
29 Oct 2024
LESS: Label-Efficient and Single-Stage Referring 3D Segmentation
Xuexun Liu
Xiaoxu Xu
Jinlong Li
Qiudan Zhang
Xu Wang
N. Sebe
Lin Ma
42
0
0
17 Oct 2024
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Chenming Zhu
Tai Wang
Wenwei Zhang
Jiangmiao Pang
Xihui Liu
134
32
0
26 Sep 2024
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Yunze Man
Shuhong Zheng
Zhipeng Bao
M. Hebert
Liang-Yan Gui
Yu-xiong Wang
78
15
0
05 Sep 2024
3D-GRES: Generalized 3D Referring Expression Segmentation
Changli Wu
Yihang Liu
Jiayi Ji
Yiwei Ma
Haowei Wang
Gen Luo
Henghui Ding
Xiaoshuai Sun
Rongrong Ji
51
7
0
30 Jul 2024
SegPoint: Segment Any Point Cloud via Large Language Model
Shuting He
Henghui Ding
Xudong Jiang
Bihan Wen
3DV
MLLM
3DPC
48
19
0
18 Jul 2024
Multi-Task Domain Adaptation for Language Grounding with 3D Objects
Penglei Sun
Yaoxian Song
Xinglin Pan
Peijie Dong
Xiaofei Yang
Qiang-qiang Wang
Zhixu Li
Tiefeng Li
Xiaowen Chu
70
1
0
03 Jul 2024
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
Ruiyuan Lyu
Tai Wang
Jingli Lin
Shuai Yang
Xiaohan Mao
...
Runsen Xu
Haifeng Huang
Chenming Zhu
Dahua Lin
Jiangmiao Pang
3DV
49
11
0
13 Jun 2024
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions
Daizong Liu
Yang Liu
Wencan Huang
Wei Hu
LM&Ro
43
9
0
09 Jun 2024
Talk to Parallel LiDARs: A Human-LiDAR Interaction Method Based on 3D Visual Grounding
Yuhang Liu
Boyi Sun
Guixu Zheng
Yishuo Wang
Jing Wang
Fei-Yue Wang
42
2
0
24 May 2024
Unifying 3D Vision-Language Understanding via Promptable Queries
Ziyu Zhu
Zhuofan Zhang
Xiaojian Ma
Xuesong Niu
Yixin Chen
Baoxiong Jia
Zhidong Deng
Siyuan Huang
Qing Li
48
21
0
19 May 2024
Grounded 3D-LLM with Referent Tokens
Yilun Chen
Shuai Yang
Haifeng Huang
Tai Wang
Ruiyuan Lyu
Runsen Xu
Dahua Lin
Jiangmiao Pang
57
24
0
16 May 2024
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Xianzheng Ma
Yash Bhalgat
Brandon Smart
Shuai Chen
Xinghui Li
...
Matthias Nießner
Ian D Reid
Angel X. Chang
Iro Laina
V. Prisacariu
LRM
35
13
0
16 May 2024
Generating Human Motion in 3D Scenes from Text Descriptions
Zhi Cen
Huaijin Pi
Sida Peng
Zehong Shen
Minghui Yang
Shuai Zhu
Hujun Bao
Xiaowei Zhou
50
19
0
13 May 2024
Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization
Yongdong Luo
Haojia Lin
Xiawu Zheng
Yigeng Jiang
Rongrong Ji
Jie Hu
Guannan Jiang
Songan Zhang
Rongrong Ji
28
0
0
17 Apr 2024
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
Bu Jin
Yupeng Zheng
Pengfei Li
Weize Li
Yuhang Zheng
...
Kun Zhan
Peng Jia
Xiaoxiao Long
Yilun Chen
Hao Zhao
3DV
79
15
0
28 Mar 2024
Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance
Zan Wang
Yixin Chen
Baoxiong Jia
Puhao Li
Jinlu Zhang
Jingze Zhang
Tengyu Liu
Yixin Zhu
Wei Liang
Siyuan Huang
VGen
DiffM
49
36
0
26 Mar 2024
Data-Efficient 3D Visual Grounding via Order-Aware Referring
Tung-Yu Wu
Sheng-Yu Huang
Yu-Chiang Frank Wang
34
0
0
25 Mar 2024
Can 3D Vision-Language Models Truly Understand Natural Language?
Weipeng Deng
Jihan Yang
Runyu Ding
Jiahui Liu
Yijiang Li
Xiaojuan Qi
Edith C.H. Ngai
42
4
0
21 Mar 2024
Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning
Rao Fu
Jingyu Liu
Xilun Chen
Yixin Nie
Wenhan Xiong
LM&Ro
LRM
52
55
0
18 Mar 2024
MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding
Chun-Peng Chang
Shaoxiang Wang
A. Pagani
Didier Stricker
43
7
0
05 Mar 2024
Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment
Xiaoxu Xu
Yitian Yuan
Qiudan Zhang
Wen-Bin Wu
Zequn Jie
Lin Ma
Xu Wang
64
4
0
15 Dec 2023
Uni3DL: Unified Model for 3D and Language Understanding
Xiang Li
Jian Ding
Zhaoyang Chen
Mohamed Elhoseiny
38
3
0
05 Dec 2023
Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding
Ozan Unal
Daniel Gehrig
Suman Saha
Luc Van Gool
36
12
0
08 Sep 2023
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language
Zhe-nan Lin
Xidong Peng
Peishan Cong
Ge Zheng
Yujin Sun
Yuenan Hou
Xinge Zhu
Sibei Yang
Yuexin Ma
VGen
92
4
0
12 Apr 2023
CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory
Nur Muhammad (Mahi) Shafiullah
Chris Paxton
Lerrel Pinto
Soumith Chintala
Arthur Szlam
VLM
LM&Ro
CLIP
98
157
0
11 Oct 2022
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
Yanmin Wu
Xinhua Cheng
Renrui Zhang
Zesen Cheng
Jian Zhang
59
63
0
29 Sep 2022
PointCLIP: Point Cloud Understanding by CLIP
Renrui Zhang
Ziyu Guo
Wei Zhang
Kunchang Li
Xupeng Miao
Bin Cui
Yu Qiao
Peng Gao
Hongsheng Li
VLM
3DPC
175
437
0
04 Dec 2021
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring
Zhihao Yuan
Xu Yan
Yinghong Liao
Ruimao Zhang
Sheng Wang
Zhen Li
Shuguang Cui
71
129
0
01 Mar 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
337
3,720
0
11 Feb 2021
1