ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.08830
  4. Cited By
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

18 December 2019
Dave Zhenyu Chen
Angel X. Chang
Matthias Nießner
    3DPC
ArXivPDFHTML

Papers citing "ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language"

50 / 238 papers shown
Title
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for
  End-to-End 3D Referring Expression Segmentation
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation
Changli Wu
Yiwei Ma
Qi Chen
Haowei Wang
Gen Luo
Jiayi Ji
Xiaoshuai Sun
3DV
36
19
0
31 Aug 2023
Towards Real Time Egocentric Segment Captioning for The Blind and
  Visually Impaired in RGB-D Theatre Images
Towards Real Time Egocentric Segment Captioning for The Blind and Visually Impaired in RGB-D Theatre Images
Khadidja Delloul
S. Larabi
32
2
0
26 Aug 2023
RefEgo: Referring Expression Comprehension Dataset from First-Person
  Perception of Ego4D
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D
Shuhei Kurita
Naoki Katsura
Eri Onami
EgoV
26
14
0
23 Aug 2023
A Unified Framework for 3D Point Cloud Visual Grounding
A Unified Framework for 3D Point Cloud Visual Grounding
Haojia Lin
Yongdong Luo
Xiawu Zheng
Lijiang Li
Rongrong Ji
Taisong Jin
Donghao Luo
Yan Wang
Liujuan Cao
Rongrong Ji
23
3
0
23 Aug 2023
Chat-3D: Data-efficiently Tuning Large Language Model for Universal
  Dialogue of 3D Scenes
Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes
Zehan Wang
Haifeng Huang
Yang Zhao
Ziang Zhang
Zhou Zhao
19
62
0
17 Aug 2023
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
Ziyu Zhu
Xiaojian Ma
Yixin Chen
Zhidong Deng
Siyuan Huang
Qing Li
LM&Ro
34
104
0
08 Aug 2023
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic
  and Regional Comprehension
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension
Qiang-feng Zhou
Chaohui Yu
Shaofeng Zhang
Sitong Wu
Zhibin Wang
Fan Wang
34
27
0
03 Aug 2023
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects
  in Cluttered Indoor Scenes
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor Scenes
Yuhao Lu
Yixuan Fan
Beixing Deng
F. Liu
Yali Li
Shengjin Wang
38
29
0
01 Aug 2023
3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding
3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding
Zehan Wang
Haifeng Huang
Yang Zhao
Lin Li
Xize Cheng
Yichen Zhu
Aoxiong Yin
Zhou Zhao
3DPC
34
20
0
25 Jul 2023
Enhancing image captioning with depth information using a
  Transformer-based framework
Enhancing image captioning with depth information using a Transformer-based framework
Aya Mahmoud Ahmed
Mohamed Yousef
K. Hussain
Yousef B. Mahdy
ViT
24
4
0
24 Jul 2023
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly
  Supervised 3D Visual Grounding
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding
Zehan Wang
Haifeng Huang
Yang Zhao
Lin Li
Xize Cheng
Yichen Zhu
Aoxiong Yin
Zhou Zhao
24
17
0
18 Jul 2023
Scalable 3D Captioning with Pretrained Models
Scalable 3D Captioning with Pretrained Models
Tiange Luo
C. Rockwell
Honglak Lee
Justin Johnson
26
152
0
12 Jun 2023
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset,
  Framework, and Benchmark
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
Zhen-fei Yin
Jiong Wang
Jianjian Cao
Zhelun Shi
Dingning Liu
...
Lei Bai
Xiaoshui Huang
Zhiyong Wang
Jing Shao
Wanli Ouyang
MLLM
32
152
0
11 Jun 2023
Multi-CLIP: Contrastive Vision-Language Pre-training for Question
  Answering tasks in 3D Scenes
Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes
Alexandros Delitzas
Maria Parelli
Nikolas Hars
G. Vlassis
Sotiris Anagnostidis
Gregor Bachmann
Thomas Hofmann
CLIP
12
19
0
04 Jun 2023
GRES: Generalized Referring Expression Segmentation
GRES: Generalized Referring Expression Segmentation
Chang Liu
Henghui Ding
Xudong Jiang
36
141
0
01 Jun 2023
Language-Guided 3D Object Detection in Point Cloud for Autonomous
  Driving
Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving
Wenhao Cheng
Junbo Yin
Wei Li
Ruigang Yang
Jianbing Shen
3DPC
22
14
0
25 May 2023
Weakly Supervised 3D Open-vocabulary Segmentation
Weakly Supervised 3D Open-vocabulary Segmentation
Kunhao Liu
Fangneng Zhan
Jiahui Zhang
Muyu Xu
Yingchen Yu
Abdulmotaleb El Saddik
Christian Theobalt
Eric P. Xing
Shijian Lu
25
66
0
23 May 2023
Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans
Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans
Taiki Miyanishi
Daich Azuma
Shuhei Kurita
M. Kawanabe
36
2
0
23 May 2023
Connecting Multi-modal Contrastive Representations
Connecting Multi-modal Contrastive Representations
Zehan Wang
Yang Zhao
Xize Cheng
Haifeng Huang
Jiageng Liu
...
Lin Li
Yongqiang Wang
Aoxiong Yin
Ziang Zhang
Zhou Zhao
30
22
0
22 May 2023
Vision-Language Pre-training with Object Contrastive Learning for 3D
  Scene Understanding
Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding
Zhang Tao
Su He
D. Tao
Bin Chen
Zhi Wang
Shutao Xia
VLM
32
22
0
18 May 2023
CLIP-Guided Vision-Language Pre-training for Question Answering in 3D
  Scenes
CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes
Maria Parelli
Alexandros Delitzas
Nikolas Hars
G. Vlassis
Sotiris Anagnostidis
Gregor Bachmann
Thomas Hofmann
CLIP
20
50
0
12 Apr 2023
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with
  Multi-modal Visual Data and Natural Language
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language
Zhe-nan Lin
Xidong Peng
Peishan Cong
Ge Zheng
Yujin Sun
Yuenan Hou
Xinge Zhu
Sibei Yang
Yuexin Ma
VGen
82
4
0
12 Apr 2023
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with
  GPT and Prototype Guidance
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance
Zoey Guo
Yiwen Tang
Renrui Zhang
Dong Wang
Zhigang Wang
Bin Zhao
Xuelong Li
35
54
0
29 Mar 2023
VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic
  Scene Graph Prediction in Point Cloud
VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud
Ziqin Wang
Bowen Cheng
Lichen Zhao
Dong Xu
Yang Tang
Lu Sheng
3DPC
27
27
0
25 Mar 2023
NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations
NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations
Joy Hsu
Jiayuan Mao
Jiajun Wu
PINN
48
48
0
23 Mar 2023
ScanERU: Interactive 3D Visual Grounding based on Embodied Reference
  Understanding
ScanERU: Interactive 3D Visual Grounding based on Embodied Reference Understanding
Ziyang Lu
Yunqiang Pei
Guoqing Wang
Yang Yang
Zheng Wang
Heng Tao Shen
46
7
0
23 Mar 2023
Text2Tex: Text-driven Texture Synthesis via Diffusion Models
Text2Tex: Text-driven Texture Synthesis via Diffusion Models
Dave Zhenyu Chen
Yawar Siddiqui
Hsin-Ying Lee
Sergey Tulyakov
Matthias Nießner
DiffM
24
191
0
20 Mar 2023
3D Concept Learning and Reasoning from Multi-View Images
3D Concept Learning and Reasoning from Multi-View Images
Yining Hong
Chun-Tse Lin
Yilun Du
Zhenfang Chen
J. Tenenbaum
Chuang Gan
3DV
25
52
0
20 Mar 2023
MXM-CLR: A Unified Framework for Contrastive Learning of Multifold
  Cross-Modal Representations
MXM-CLR: A Unified Framework for Contrastive Learning of Multifold Cross-Modal Representations
Ye Wang
Bo‐Shu Jiang
C. Zou
Rui Ma
32
5
0
20 Mar 2023
Narrator: Towards Natural Control of Human-Scene Interaction Generation
  via Relationship Reasoning
Narrator: Towards Natural Control of Human-Scene Interaction Generation via Relationship Reasoning
Haibiao Xuan
Xiongzheng Li
Jinsong Zhang
Hongwen Zhang
Yebin Liu
Kun Li
26
7
0
16 Mar 2023
Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud
  Pre-training
Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training
Ziyu Guo
Renrui Zhang
Longtian Qiu
Xianzhi Li
Pheng-Ann Heng
3DPC
34
52
0
27 Feb 2023
Learning 6-DoF Fine-grained Grasp Detection Based on Part Affordance Grounding
Learning 6-DoF Fine-grained Grasp Detection Based on Part Affordance Grounding
Yaoxian Song
Penglei Sun
Piaopiao Jin
Yi Ren
Yu Zheng
Zhixu Li
Xiaowen Chu
Yueying Zhang
Tiefeng Li
Jason Gu
69
16
0
27 Jan 2023
Text to Point Cloud Localization with Relation-Enhanced Transformer
Text to Point Cloud Localization with Relation-Enhanced Transformer
Guangzhi Wang
Hehe Fan
Mohan S. Kankanhalli
3DPC
30
14
0
13 Jan 2023
End-to-End 3D Dense Captioning with Vote2Cap-DETR
End-to-End 3D Dense Captioning with Vote2Cap-DETR
Sijin Chen
Erik Cambria
Xin Chen
Yinjie Lei
Tao Chen
YU Gang
ViT
21
52
0
06 Jan 2023
LidarCLIP or: How I Learned to Talk to Point Clouds
LidarCLIP or: How I Learned to Talk to Point Clouds
Georg Hess
Adam Tonderski
Christoffer Petersson
Kalle AAstrom
Lennart Svensson
DiffM
27
22
0
13 Dec 2022
ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved
  Visio-Linguistic Models in 3D Scenes
ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes
Ahmed Abdelreheem
Kyle Olszewski
Hsin-Ying Lee
Peter Wonka
Panos Achlioptas
3DPC
22
28
0
12 Dec 2022
LADIS: Language Disentanglement for 3D Shape Editing
LADIS: Language Disentanglement for 3D Shape Editing
Ian Huang
Panos Achlioptas
Tianyi Zhang
Sergey Tulyakov
Minhyuk Sung
Leonidas J. Guibas
31
10
0
09 Dec 2022
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual
  Grounding
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding
Dave Zhenyu Chen
Ronghang Hu
Xinlei Chen
Matthias Nießner
Angel X. Chang
29
52
0
01 Dec 2022
OpenScene: 3D Scene Understanding with Open Vocabularies
OpenScene: 3D Scene Understanding with Open Vocabularies
Songyou Peng
Kyle Genova
ChiyuMaxJiang
Andrea Tagliasacchi
Marc Pollefeys
Thomas Funkhouser
3DPC
VLM
40
348
0
28 Nov 2022
Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for
  3D Visual Grounding
Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding
Eslam Mohamed Bakr
Yasmeen Alsaedy
Mohamed Elhoseiny
3DPC
23
41
0
25 Nov 2022
Language-Assisted 3D Feature Learning for Semantic Scene Understanding
Language-Assisted 3D Feature Learning for Semantic Scene Understanding
Junbo Zhang
Guo Fan
Guanghan Wang
Zhèngyuān Sū
Kaisheng Ma
L. Yi
3DPC
27
7
0
25 Nov 2022
Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
Shizhe Chen
Pierre-Louis Guhur
Makarand Tapaswi
Cordelia Schmid
Ivan Laptev
51
75
0
17 Nov 2022
Learning Point-Language Hierarchical Alignment for 3D Visual Grounding
Learning Point-Language Hierarchical Alignment for 3D Visual Grounding
Jiaming Chen
Weihua Luo
Ran Song
Xiaolin K. Wei
Lin Ma
Wei Emma Zhang
3DV
40
6
0
22 Oct 2022
PoseScript: Linking 3D Human Poses and Natural Language
PoseScript: Linking 3D Human Poses and Natural Language
Ginger Delmas
Philippe Weinzaepfel
Thomas Lucas
Francesc Moreno-Noguer
Grégory Rogez
3DH
38
1
0
21 Oct 2022
HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes
HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes
Zan Wang
Yixin Chen
Tengyu Liu
Yixin Zhu
Wei Liang
Siyuan Huang
43
104
0
18 Oct 2022
SQA3D: Situated Question Answering in 3D Scenes
SQA3D: Situated Question Answering in 3D Scenes
Xiaojian Ma
Silong Yong
Zilong Zheng
Qing Li
Yitao Liang
Song-Chun Zhu
Siyuan Huang
LM&Ro
33
132
0
14 Oct 2022
CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory
CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory
Nur Muhammad (Mahi) Shafiullah
Chris Paxton
Lerrel Pinto
Soumith Chintala
Arthur Szlam
VLM
LM&Ro
CLIP
95
156
0
11 Oct 2022
Contextual Modeling for 3D Dense Captioning on Point Clouds
Contextual Modeling for 3D Dense Captioning on Point Clouds
Yufeng Zhong
Longdao Xu
Jiebo Luo
Lin Ma
44
15
0
08 Oct 2022
Affection: Learning Affective Explanations for Real-World Visual Data
Affection: Learning Affective Explanations for Real-World Visual Data
Panos Achlioptas
M. Ovsjanikov
Leonidas J. Guibas
Sergey Tulyakov
79
11
0
04 Oct 2022
Enhancing Interpretability and Interactivity in Robot Manipulation: A
  Neurosymbolic Approach
Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach
Georgios Tziafas
H. Kasaei
LM&Ro
20
3
0
03 Oct 2022
Previous
12345
Next