Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.13143
Cited By
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
18 February 2025
Zekun Qi
Wenyao Zhang
Yufei Ding
Runpei Dong
Xinqiang Yu
Jingwen Li
Lingyun Xu
Bing Li
Xialin He
Guofan Fan
JIazhao Zhang
Jiawei He
Jiayuan Gu
Xin Jin
Kaisheng Ma
Zhizheng Zhang
He Wang
Li Yi
LM&Ro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation"
45 / 45 papers shown
Title
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
Muzhi Zhu
Hao Zhong
Canyu Zhao
Zongze Du
Zheng Huang
...
Hao Chen
Cheng Zou
Jingdong Chen
Ming-Hsuan Yang
Chunhua Shen
LRM
68
0
0
27 May 2025
Conditioning Matters: Training Diffusion Policies is Faster Than You Think
Zibin Dong
Yicheng Liu
Yinchuan Li
Hang Zhao
Haifeng Zhang
57
0
0
16 May 2025
DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation
Jiangran Lyu
Ziming Li
Xuesong Shi
Chaoyi Xu
Yizhou Wang
He Wang
82
0
0
21 Mar 2025
HumanoidPano: Hybrid Spherical Panoramic-LiDAR Cross-Modal Perception for Humanoid Robots
Qiang Zhang
Zhang Zhang
Wei Cui
Jingkai Sun
Jiahang Cao
...
Hao-Ran Cheng
Yujie Chen
Liwen Wang
Jian Tang
Renjing Xu
82
3
0
12 Mar 2025
Learning Getting-Up Policies for Real-World Humanoid Robots
Xialin He
Runpei Dong
Zixuan Chen
Saurabh Gupta
86
6
0
17 Feb 2025
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World
Weixin Mao
Weiheng Zhong
Zhou Jiang
Dong Fang
Zhongyue Zhang
...
Fan Jia
Tiancai Wang
Haoqiang Fan
Osamu Yoshie
Osamu Yoshie
150
6
0
29 Nov 2024
MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Ruicheng Wang
Sicheng Xu
Cassie Dai
Jianfeng Xiang
Yu Deng
Xin Tong
Jiaolong Yang
TPM
3DH
MDE
89
32
0
24 Oct 2024
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Yuang Peng
Yuxin Cui
Haomiao Tang
Zekun Qi
Runpei Dong
Jing Bai
Chunrui Han
Zheng Ge
Xiangyu Zhang
Shu-Tao Xia
EGVM
113
34
0
24 Jun 2024
Graspness Discovery in Clutters for Fast and Accurate Grasp Detection
Chenxi Wang
Hao-Shu Fang
Minghao Gou
Hongjie Fang
Jin Gao
Cewu Lu
70
113
0
17 Jun 2024
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Alexander Khazatsky
Karl Pertsch
Suraj Nair
Ashwin Balakrishna
Sudeep Dasari
...
Thomas Kollar
Sergey Levine
Chelsea Finn
Sergey Levine
Chelsea Finn
109
197
0
19 Mar 2024
RT-H: Action Hierarchies Using Language
Suneel Belkhale
Tianli Ding
Ted Xiao
P. Sermanet
Quon Vuong
Jonathan Tompson
Yevgen Chebotar
Debidatta Dwibedi
Dorsa Sadigh
LM&Ro
55
81
0
04 Mar 2024
RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation
Hanxiao Jiang
Binghao Huang
Ruihai Wu
Zhuoran Li
Shubham Garg
H. Nayyeri
Shenlong Wang
Yunzhu Li
78
19
0
23 Feb 2024
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
Tong Wu
Guandao Yang
Zhibing Li
Kai Zhang
Ziwei Liu
Leonidas Guibas
Dahua Lin
Gordon Wetzstein
EGVM
VGen
54
89
0
08 Jan 2024
Interactive Humanoid: Online Full-Body Motion Reaction Synthesis with Social Affordance Canonicalization and Forecasting
Yunze Liu
Changxi Chen
Li Yi
50
16
0
14 Dec 2023
Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-Language Models
Ivan Kapelyukh
Yifei Ren
Ignacio Alzugaray
Edward Johns
VLM
LM&Ro
65
20
0
07 Dec 2023
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Zhangyang Qi
Ye Fang
Zeyi Sun
Xiaoyang Wu
Tong Wu
Jiaqi Wang
Dahua Lin
Hengshuang Zhao
MLLM
101
36
0
05 Dec 2023
Open-vocabulary object 6D pose estimation
Jaime Corsetti
Davide Boscaini
Changjae Oh
Andrea Cavallaro
Fabio Poiesi
50
11
0
01 Dec 2023
OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data
Shiyang Lu
Haonan Chang
E. Jing
Abdeslam Boularias
Kostas Bekris
54
57
0
06 Nov 2023
3D Implicit Transporter for Temporally Consistent Keypoint Discovery
Chengliang Zhong
Yuhang Zheng
Yupeng Zheng
Hao Zhao
Li Yi
...
Pengfei Li
Guyue Zhou
Chao Yang
Xinliang Zhang
Jian Zhao
3DPC
67
16
0
10 Sep 2023
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning
Liang Zhao
En Yu
Zheng Ge
Jinrong Yang
Hao-Ran Wei
...
Jian‐Yuan Sun
Yuang Peng
Runpei Dong
Chunrui Han
Xiangyu Zhang
MLLM
LRM
46
54
0
18 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
226
4,085
0
09 Jun 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
473
13,788
0
15 Mar 2023
Open-World Object Manipulation using Pre-trained Vision-Language Models
Austin Stone
Ted Xiao
Yao Lu
K. Gopalakrishnan
Kuang-Huei Lee
...
Sean Kirmani
Brianna Zitkovich
F. Xia
Chelsea Finn
Karol Hausman
LM&Ro
184
148
0
02 Mar 2023
Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents
Wenlong Huang
Fei Xia
Dhruv Shah
Danny Driess
Andy Zeng
...
Pete Florence
Igor Mordatch
Sergey Levine
Karol Hausman
Brian Ichter
LM&Ro
60
46
0
01 Mar 2023
Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?
Runpei Dong
Zekun Qi
Linfeng Zhang
Junbo Zhang
Jian‐Yuan Sun
Zheng Ge
Li Yi
Kaisheng Ma
ViT
3DPC
45
85
0
16 Dec 2022
FlowBot3D: Learning 3D Articulation Flow to Manipulate Articulated Objects
Ben Eisner
Harry Zhang
David Held
80
90
0
09 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
276
3,458
0
29 Apr 2022
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Michael Ahn
Anthony Brohan
Noah Brown
Yevgen Chebotar
Omar Cortes
...
Ted Xiao
Peng Xu
Sichun Xu
Mengyuan Yan
Andy Zeng
LM&Ro
107
1,901
0
04 Apr 2022
Masked Autoencoders for Point Cloud Self-supervised Learning
Yatian Pang
Wenxiao Wang
Francis E. H. Tay
Wen Liu
Yonghong Tian
Liuliang Yuan
3DPC
ViT
53
461
0
13 Mar 2022
HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction
Yunze Liu
Yun-Hai Liu
Chen Jiang
Kangbo Lyu
Weikang Wan
Hao Shen
Bo-Hua Liang
Zhoujie Fu
He Wang
Li Yi
68
176
0
03 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
570
9,009
0
28 Jan 2022
Human Hands as Probes for Interactive Object Understanding
Mohit Goyal
Sahil Modi
Rishabh Goyal
Saurabh Gupta
44
48
0
16 Dec 2021
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
Xumin Yu
Lulu Tang
Yongming Rao
Tiejun Huang
Jie Zhou
Jiwen Lu
3DPC
93
661
0
29 Nov 2021
Voxel Transformer for 3D Object Detection
Jiageng Mao
Yujing Xue
Minzhe Niu
Haoyue Bai
Jiashi Feng
Xiaodan Liang
Hang Xu
Chunjing Xu
3DPC
ViT
54
407
0
06 Sep 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
666
28,659
0
26 Feb 2021
MVTN: Multi-View Transformation Network for 3D Shape Recognition
Abdullah Hamdi
Silvio Giancola
Guohao Li
3DV
3DPC
74
203
0
26 Nov 2020
Point Transformer
Nico Engel
Vasileios Belagiannis
Klaus C. J. Dietmayer
3DPC
140
1,966
0
02 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
317
40,217
0
22 Oct 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
461
41,106
0
28 May 2020
Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions
Johanna Wald
Helisa Dhamo
Nassir Navab
Federico Tombari
3DV
3DPC
42
214
0
08 Apr 2020
Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data
Mikaela Angelina Uy
Quang Pham
Binh-Son Hua
D. Nguyen
Sai-Kit Yeung
3DV
3DPC
65
773
0
13 Aug 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
914
93,936
0
11 Oct 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
430
129,831
0
12 Jun 2017
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
Angela Dai
Angel X. Chang
Manolis Savva
Maciej Halber
Thomas Funkhouser
Matthias Nießner
3DPC
3DV
202
4,001
0
14 Feb 2017
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
C. Qi
Hao Su
Kaichun Mo
Leonidas Guibas
3DH
3DPC
3DV
PINN
395
14,191
0
02 Dec 2016
1