Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.16537
Cited By
v1
v2
v3
v4 (latest)
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
25 November 2024
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics"
17 / 67 papers shown
Title
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models
Ishika Singh
Valts Blukis
Arsalan Mousavian
Ankit Goyal
Danfei Xu
Jonathan Tremblay
Dieter Fox
Jesse Thomason
Animesh Garg
LM&Ro
LLMAG
212
659
0
22 Sep 2022
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
131
185
0
30 Apr 2022
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
LRM
NAI
102
20
0
05 Apr 2022
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Michael Ahn
Anthony Brohan
Noah Brown
Yevgen Chebotar
Omar Cortes
...
Ted Xiao
Peng Xu
Sichun Xu
Mengyuan Yan
Andy Zeng
LM&Ro
220
1,991
0
04 Apr 2022
6-DoF Pose Estimation of Household Objects for Robotic Manipulation: An Accessible Dataset and Benchmark
Stephen Tyree
Jonathan Tremblay
Thang To
Jia Cheng
Terry Mosier
Jeffrey Smith
Stan Birchfield
93
98
0
11 Mar 2022
One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
Chan Hee Song
Jihyung Kil
Tai-Yu Pan
Brian M. Sadler
Wei-Lun Chao
Yu-Chuan Su
LRM
80
33
0
14 Feb 2022
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
Wenlong Huang
Pieter Abbeel
Deepak Pathak
Igor Mordatch
LM&Ro
129
1,129
0
18 Jan 2022
ScanQA: 3D Question Answering for Spatial Scene Understanding
Daich Azuma
Taiki Miyanishi
Shuhei Kurita
M. Kawanabe
108
208
0
20 Dec 2021
NLVR2 Visual Bias Analysis
Alane Suhr
Yoav Artzi
41
13
0
23 Sep 2019
RIO: 3D Object Instance Re-Localization in Changing Indoor Environments
Johanna Wald
A. Avetisyan
Nassir Navab
Federico Tombari
Matthias Nießner
80
160
0
16 Aug 2019
A Corpus for Reasoning About Natural Language Grounded in Photographs
Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
LRM
122
610
0
01 Nov 2018
Matterport3D: Learning from RGB-D Data in Indoor Environments
Angel X. Chang
Angela Dai
Thomas Funkhouser
Maciej Halber
Matthias Nießner
Manolis Savva
Shuran Song
Andy Zeng
Yinda Zhang
3DV
3DPC
221
1,923
0
18 Sep 2017
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
Angela Dai
Angel X. Chang
Manolis Savva
Maciej Halber
Thomas Funkhouser
Matthias Nießner
3DPC
3DV
745
4,104
0
14 Feb 2017
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
372
2,394
0
20 Dec 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
...
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
490
5,779
0
23 Feb 2016
VQA: Visual Question Answering
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
468
5,527
0
03 May 2015
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
501
44,016
0
01 May 2014
Previous
1
2