Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.17766
Cited By
v1
v2
v3 (latest)
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
27 February 2024
Zekun Qi
Runpei Dong
Shaochen Zhang
Haoran Geng
Chunrui Han
Zheng Ge
Li Yi
Kaisheng Ma
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ShapeLLM: Universal 3D Object Understanding for Embodied Interaction"
50 / 102 papers shown
Title
GAPrompt: Geometry-Aware Point Cloud Prompt for 3D Vision Model
Zixiang Ai
Zichen Liu
Yuanhang Lei
Zhenyu Cui
Xu Zou
Jiahuan Zhou
96
1
0
07 May 2025
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
Song Wang
Xiaolu Liu
Lingdong Kong
Jianyun Xu
Chunyong Hu
Gongfan Fang
Wentong Li
Jianke Zhu
Xinchao Wang
120
0
0
22 Apr 2025
Cognitive Science-Inspired Evaluation of Core Capabilities for Object Understanding in AI
Danaja Rutar
Alva Markelius
Konstantinos Voudouris
José Hernández-Orallo
Lucy G. Cheke
OCL
ELM
138
0
0
27 Mar 2025
Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning
Yanjun Chen
Yirong Sun
Xinghao Chen
Jian Wang
Xiaoyu Shen
W. Li
Wei Zhang
3DV
LRM
126
1
0
08 Mar 2025
3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds
Hengshuo Chu
Xiang Deng
Qi Lv
Xiaoyang Chen
Yinchuan Li
Haifeng Zhang
Liqiang Nie
140
4
0
27 Feb 2025
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Zhangyang Qi
Zhixiong Zhang
Ye Fang
Jiaqi Wang
Hengshuang Zhao
197
16
0
02 Jan 2025
Do large language vision models understand 3D shapes?
Sagi Eppel
3DV
233
2
0
14 Dec 2024
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
Chunlin Yu
Hanqing Wang
Ye Shi
Haoyang Luo
Sibei Yang
Jingyi Yu
Jingya Wang
LRM
LM&Ro
185
3
0
02 Dec 2024
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
Yongwei Chen
Yushi Lan
Shangchen Zhou
Tengfei Wang
Xingang Pan
216
6
0
25 Nov 2024
Is 'Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning
Ji Hyeok Jung
Eun Tae Kim
S. Kim
Joo Ho Lee
Bumsoo Kim
Buru Chang
VLM
493
2
0
24 Nov 2024
More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding
Yuan Tang
Xu Han
Xianzhi Li
Qiao Yu
Jinfeng Xu
Yixue Hao
Long Hu
Min Chen
156
3
0
28 Aug 2024
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Yuang Peng
Yuxin Cui
Haomiao Tang
Zekun Qi
Runpei Dong
Jing Bai
Chunrui Han
Zheng Ge
Xiangyu Zhang
Shu-Tao Xia
EGVM
154
39
0
24 Jun 2024
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Han-Hung Lee
Yiming Zhang
Angel X. Chang
3DPC
149
4
0
17 Jun 2024
Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
Kuan-Chih Huang
Xiangtai Li
Lu Qi
Shuicheng Yan
Ming-Hsuan Yang
LRM
161
12
0
27 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
307
54
0
23 May 2024
TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding
Zhihao Zhang
Shengcao Cao
Yu Wang
78
17
0
28 Feb 2024
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
Tong Wu
Guandao Yang
Zhibing Li
Kai Zhang
Ziwei Liu
Leonidas Guibas
Dahua Lin
Gordon Wetzstein
EGVM
VGen
101
96
0
08 Jan 2024
Generative Multimodal Models are In-Context Learners
Quan-Sen Sun
Yufeng Cui
Xiaosong Zhang
Fan Zhang
Qiying Yu
...
Yueze Wang
Yongming Rao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
LRM
155
290
0
20 Dec 2023
Sequential Modeling Enables Scalable Learning for Large Vision Models
Yutong Bai
Xinyang Geng
K. Mangalam
Amir Bar
Alan Yuille
Trevor Darrell
Jitendra Malik
Alexei A. Efros
MLLM
VLM
77
169
0
01 Dec 2023
GOAT: GO to Any Thing
Matthew Chang
Théophile Gervet
Mukul Khanna
Sriram Yenamandra
Dhruv Shah
...
Saurabh Gupta
Dhruv Batra
Roozbeh Mottaghi
Jitendra Malik
Devendra Singh Chaplot
90
74
0
10 Nov 2023
OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data
Shiyang Lu
Haonan Chang
E. Jing
Abdeslam Boularias
Kostas Bekris
83
58
0
06 Nov 2023
Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training
Yipeng Gao
Zeyu Wang
Wei-Shi Zheng
Cihang Xie
Yuyin Zhou
3DPC
131
10
0
03 Nov 2023
Uni3D: Exploring Unified 3D Representation at Scale
Junsheng Zhou
Jinsheng Wang
Baorui Ma
Yu-Shen Liu
Tiejun Huang
Xinlong Wang
87
98
0
10 Oct 2023
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Xichen Pan
Li Dong
Shaohan Huang
Zhiliang Peng
Wenhu Chen
Furu Wei
VLM
149
68
0
04 Oct 2023
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Ajay Jaiswal
Zhe Gan
Xianzhi Du
Bowen Zhang
Zhangyang Wang
Yinfei Yang
MQ
111
50
0
02 Oct 2023
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Yue Zhang
Yafu Li
Leyang Cui
Deng Cai
Lemao Liu
...
Longyue Wang
Anh Tuan Luu
Wei Bi
Freda Shi
Shuming Shi
RALM
LRM
HILM
121
579
0
03 Sep 2023
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following
Ziyu Guo
Renrui Zhang
Xiangyang Zhu
Yiwen Tang
Xianzheng Ma
...
Ke Chen
Peng Gao
Xianzhi Li
Hongsheng Li
Pheng-Ann Heng
MLLM
96
144
0
01 Sep 2023
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning
Liang Zhao
En Yu
Zheng Ge
Jinrong Yang
Hao-Ran Wei
...
Jian‐Yuan Sun
Yuang Peng
Runpei Dong
Chunrui Han
Xiangyu Zhang
MLLM
LRM
69
54
0
18 Jul 2023
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Wenlong Huang
Chen Wang
Ruohan Zhang
Yunzhu Li
Jiajun Wu
Li Fei-Fei
LM&Ro
127
518
0
12 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLM
VLM
154
238
0
07 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
458
4,444
0
09 Jun 2023
Point-GCC: Universal Self-supervised 3D Scene Pre-training via Geometry-Color Contrast
Guo Fan
Zekun Qi
Wenkai Shi
Kaisheng Ma
3DPC
SSL
103
10
0
31 May 2023
ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
Le Xue
Ning Yu
Shu Zhen Zhang
Artemis Panagopoulou
Junnan Li
...
Jiajun Wu
Caiming Xiong
Ran Xu
Juan Carlos Niebles
Silvio Savarese
117
128
0
14 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLM
VLM
150
2,098
0
11 May 2023
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
295
956
0
27 Apr 2023
UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning
Weikang Wan
Haoran Geng
Yun-Hai Liu
Zikang Shan
Yaodong Yang
Li Yi
He Wang
122
101
0
02 Apr 2023
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
Yongliang Shen
Kaitao Song
Xu Tan
Dongsheng Li
Weiming Lu
Yueting Zhuang
MLLM
135
911
0
30 Mar 2023
PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations
Haoran Geng
Ziming Li
Yiran Geng
Jiayi Chen
Hao Dong
He Wang
3DPC
111
44
0
29 Mar 2023
Point2Vec for Self-Supervised Representation Learning on Point Clouds
Karim Abou Zeid
Jonas Schult
Alexander Hermans
Bastian Leibe
3DPC
66
29
0
29 Mar 2023
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
Yaobo Liang
Chenfei Wu
Ting Song
Wenshan Wu
Yan Xia
...
Shaoguang Mao
Yuntao Wang
Linjun Shou
Ming Gong
Nan Duan
LLMAG
CLL
81
205
0
29 Mar 2023
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Quan-Sen Sun
Yuxin Fang
Ledell Yu Wu
Xinlong Wang
Yue Cao
CLIP
VLM
149
513
0
27 Mar 2023
CAMS: CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation Synthesis
Juntian Zheng
Qingyuan Zheng
Lixing Fang
Yun-Hai Liu
Li Yi
86
46
0
25 Mar 2023
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
E. Azarnasab
Faisal Ahmed
Zicheng Liu
Ce Liu
Michael Zeng
Lijuan Wang
ReLM
KELM
LRM
113
394
0
20 Mar 2023
Self-Supervised Category-Level Articulated Object Pose Estimation with Part-Level SE(3) Equivariance
Xueyi Liu
Ji Zhang
Ruizhen Hu
Haibin Huang
He Wang
Li Yi
3DPC
85
23
0
28 Feb 2023
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
Zhiyang Xu
Ying Shen
Lifu Huang
MLLM
108
120
0
21 Dec 2022
Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?
Runpei Dong
Zekun Qi
Linfeng Zhang
Junbo Zhang
Jian‐Yuan Sun
Zheng Ge
Li Yi
Kaisheng Ma
ViT
3DPC
101
91
0
16 Dec 2022
Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders
Renrui Zhang
Liuhui Wang
Yu Qiao
Peng Gao
Hongsheng Li
3DPC
79
134
0
13 Dec 2022
OpenScene: 3D Scene Understanding with Open Vocabularies
Songyou Peng
Kyle Genova
ChiyuMaxJiang
Andrea Tagliasacchi
Marc Pollefeys
Thomas Funkhouser
3DPC
VLM
113
367
0
28 Nov 2022
PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning
Xiangyang Zhu
Renrui Zhang
Bowei He
Ziyu Guo
Ziyao Zeng
Zipeng Qin
Shanghang Zhang
Peng Gao
VLM
90
145
0
21 Nov 2022
GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts
Haoran Geng
Helin Xu
Chengyan Zhao
Chao Xu
Li Yi
Siyuan Huang
He Wang
3DPC
108
100
0
10 Nov 2022
1
2
3
Next