ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.10255
  4. Cited By
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks
  via Multi-modal Large Language Models

When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

16 May 2024
Xianzheng Ma
Yash Bhalgat
Brandon Smart
Shuai Chen
Xinghui Li
Jian Ding
Jindong Gu
Dave Zhenyu Chen
Songyou Peng
Jiawang Bian
Philip Torr
Marc Pollefeys
Matthias Nießner
Ian D Reid
Angel X. Chang
Iro Laina
V. Prisacariu
    LRM
ArXiv (abs)PDFHTMLGithub (1689★)

Papers citing "When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models"

50 / 155 papers shown
Title
SpatialLLM: From Multi-modality Data to Urban Spatial Intelligence
SpatialLLM: From Multi-modality Data to Urban Spatial Intelligence
Jiabin Chen
Haiping Wang
Jinpeng Li
Yuan Liu
Zhen Dong
Bisheng Yang
127
0
0
19 May 2025
Do large language vision models understand 3D shapes?
Do large language vision models understand 3D shapes?
Sagi Eppel
3DV
203
1
0
14 Dec 2024
Text-to-3D Shape Generation
Text-to-3D Shape Generation
Han-Hung Lee
Manolis Savva
Angel X. Chang
59
13
0
20 Mar 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomas Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALMLM&MAELM
198
408
0
09 Feb 2024
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
Tong Wu
Guandao Yang
Zhibing Li
Kai Zhang
Ziwei Liu
Leonidas Guibas
Dahua Lin
Gordon Wetzstein
EGVMVGen
76
95
0
08 Jan 2024
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards
  Embodied AI
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Tai Wang
Xiaohan Mao
Chenming Zhu
Runsen Xu
Ruiyuan Lyu
...
Tianfan Xue
Xihui Liu
Cewu Lu
Dahua Lin
Jiangmiao Pang
LM&Ro
70
73
0
26 Dec 2023
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR
  Understanding
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
Senqiao Yang
Jiaming Liu
Ray Zhang
Mingjie Pan
Zoey Guo
Xiaoqi Li
Zehui Chen
Peng Gao
Yandong Guo
Shanghang Zhang
3DV
90
67
0
21 Dec 2023
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Mingsheng Li
Xin Chen
C. Zhang
Sijin Chen
Erik Cambria
Fukun Yin
Gang Yu
Tao Chen
65
26
0
17 Dec 2023
SMERF: Streamable Memory Efficient Radiance Fields for Real-Time
  Large-Scene Exploration
SMERF: Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene Exploration
Daniel Duckworth
Peter Hedman
Christian Reiser
Peter Zhizhin
Jean-François Thibert
Mario Lucic
Richard Szeliski
Jonathan T. Barron
3DGS
49
53
0
12 Dec 2023
Uni3DL: Unified Model for 3D and Language Understanding
Uni3DL: Unified Model for 3D and Language Understanding
Xiang Li
Jian Ding
Zhaoyang Chen
Mohamed Elhoseiny
92
5
0
05 Dec 2023
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Zhangyang Qi
Ye Fang
Zeyi Sun
Xiaoyang Wu
Tong Wu
Jiaqi Wang
Dahua Lin
Hengshuang Zhao
MLLM
135
36
0
05 Dec 2023
Towards Learning a Generalist Model for Embodied Navigation
Towards Learning a Generalist Model for Embodied Navigation
Duo Zheng
Shijia Huang
Lin Zhao
Yiwu Zhong
Liwei Wang
LM&Ro
108
51
0
04 Dec 2023
ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
Fukun Yin
Xin Chen
C. Zhang
Biao Jiang
Zibo Zhao
Jiayuan Fan
Gang Yu
Taihao Li
Tao Chen
93
21
0
29 Nov 2023
SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion
  Priors
SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
Dave Zhenyu Chen
Haoxuan Li
Hsin-Ying Lee
Sergey Tulyakov
Matthias Nießner
DiffM
63
29
0
28 Nov 2023
CG-HOI: Contact-Guided 3D Human-Object Interaction Generation
CG-HOI: Contact-Guided 3D Human-Object Interaction Generation
Christian Diller
Angela Dai
100
68
0
27 Nov 2023
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Zhihao Yuan
Jinke Ren
Chun-Mei Feng
Hengshuang Zhao
Shuguang Cui
Zhen Li
89
30
0
26 Nov 2023
OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D
  Data
OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data
Shiyang Lu
Haonan Chang
E. Jing
Abdeslam Boularias
Kostas Bekris
81
58
0
06 Nov 2023
SD4Match: Learning to Prompt Stable Diffusion Model for Semantic
  Matching
SD4Match: Learning to Prompt Stable Diffusion Model for Semantic Matching
Xinghui Li
Jingyi Lu
Kai Han
V. Prisacariu
DiffM
69
21
0
26 Oct 2023
PoRF: Pose Residual Field for Accurate Neural Surface Reconstruction
PoRF: Pose Residual Field for Accurate Neural Surface Reconstruction
Jiawang Bian
Wenjing Bian
V. Prisacariu
Philip Torr
67
13
0
11 Oct 2023
Uni3D: Exploring Unified 3D Representation at Scale
Uni3D: Exploring Unified 3D Representation at Scale
Junsheng Zhou
Jinsheng Wang
Baorui Ma
Yu-Shen Liu
Tiejun Huang
Xinlong Wang
73
96
0
10 Oct 2023
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving
Hao Sha
Yao Mu
Yuxuan Jiang
Li Chen
Chenfeng Xu
Ping Luo
Shengbo Eben Li
Masayoshi Tomizuka
Wei Zhan
Mingyu Ding
246
176
0
04 Oct 2023
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable
  Autonomous Driving
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
Long Chen
Oleg Sinavski
Jan Hünermann
Alice Karnsund
Andrew James Willmott
Danny Birch
Daniel Maund
Jamie Shotton
MLLM
92
204
0
03 Oct 2023
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language
  Model as an Agent
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent
Jianing Yang
Xuweiyi Chen
Shengyi Qian
Nikhil Madaan
Madhavan Iyengar
David Fouhey
Joyce Chai
LM&RoLLMAG
130
97
0
21 Sep 2023
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D
  Understanding, Generation, and Instruction Following
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following
Ziyu Guo
Renrui Zhang
Xiangyang Zhu
Yiwen Tang
Xianzheng Ma
...
Ke Chen
Peng Gao
Xianzhi Li
Hongsheng Li
Pheng-Ann Heng
MLLM
80
144
0
01 Sep 2023
Beyond First Impressions: Integrating Joint Multi-modal Cues for
  Comprehensive 3D Representation
Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation
Haowei Wang
Jiji Tang
Jiayi Ji
Xiaoshuai Sun
Rongsheng Zhang
...
Minda Zhao
Lincheng Li
zeng zhao
Tangjie Lv
Rongrong Ji
3DV
61
15
0
06 Aug 2023
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with
  Language Models
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Wenlong Huang
Chen Wang
Ruohan Zhang
Yunzhu Li
Jiajun Wu
Li Fei-Fei
LM&Ro
109
510
0
12 Jul 2023
OpenMask3D: Open-Vocabulary 3D Instance Segmentation
OpenMask3D: Open-Vocabulary 3D Instance Segmentation
Ayca Takmaz
Elisabetta Fedele
R. Sumner
Marc Pollefeys
F. Tombari
Francis Engelmann
ISegVLM
75
173
0
23 Jun 2023
Multi-CLIP: Contrastive Vision-Language Pre-training for Question
  Answering tasks in 3D Scenes
Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes
Alexandros Delitzas
Maria Parelli
Nikolas Hars
G. Vlassis
Sotiris Anagnostidis
Gregor Bachmann
Thomas Hofmann
CLIP
43
20
0
04 Jun 2023
VL-Fields: Towards Language-Grounded Neural Implicit Spatial
  Representations
VL-Fields: Towards Language-Grounded Neural Implicit Spatial Representations
Nikolaos Tsagkas
Oisin Mac Aodha
Chris Xiaoxuan Lu
VLM
77
26
0
21 May 2023
AvatarCraft: Transforming Text into Neural Human Avatars with
  Parameterized Shape and Pose Control
AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control
Ruixia Jiang
Can Wang
Jingbo Zhang
Menglei Chai
Mingming He
Dongdong Chen
Jing Liao
DiffM
52
81
0
30 Mar 2023
Fantasia3D: Disentangling Geometry and Appearance for High-quality
  Text-to-3D Content Creation
Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation
Rui Chen
Yuxiao Chen
Ningxin Jiao
Kui Jia
DiffM
105
588
0
24 Mar 2023
Compositional 3D Scene Generation using Locally Conditioned Diffusion
Compositional 3D Scene Generation using Locally Conditioned Diffusion
Ryan Po
Gordon Wetzstein
DiffM
71
88
0
21 Mar 2023
3D Concept Learning and Reasoning from Multi-View Images
3D Concept Learning and Reasoning from Multi-View Images
Yining Hong
Chun-Tse Lin
Yilun Du
Zhenfang Chen
J. Tenenbaum
Chuang Gan
3DV
77
52
0
20 Mar 2023
MERF: Memory-Efficient Radiance Fields for Real-time View Synthesis in
  Unbounded Scenes
MERF: Memory-Efficient Radiance Fields for Real-time View Synthesis in Unbounded Scenes
Christian Reiser
Richard Szeliski
Dor Verbin
Pratul P. Srinivasan
B. Mildenhall
Andreas Geiger
Jonathan T. Barron
Peter Hedman
104
233
0
23 Feb 2023
ConceptFusion: Open-set Multimodal 3D Mapping
ConceptFusion: Open-set Multimodal 3D Mapping
Krishna Murthy Jatavallabhula
Ali Kuwajerwala
Qiao Gu
Mohd. Omama
Tao Chen
...
Celso Miguel de Melo
Madhava Krishna
Liam Paull
Florian Shkurti
Antonio Torralba
73
245
0
14 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
429
4,563
0
30 Jan 2023
Text-To-4D Dynamic Scene Generation
Text-To-4D Dynamic Scene Generation
Uriel Singer
Shelly Sheynin
Adam Polyak
Oron Ashual
Iurii Makarov
...
Naman Goyal
Andrea Vedaldi
Devi Parikh
Justin Johnson
Yaniv Taigman
DiffM
80
155
0
26 Jan 2023
End-to-End 3D Dense Captioning with Vote2Cap-DETR
End-to-End 3D Dense Captioning with Vote2Cap-DETR
Sijin Chen
Erik Cambria
Xin Chen
Yinjie Lei
Tao Chen
YU Gang
ViT
54
58
0
06 Jan 2023
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and
  Text-to-Image Diffusion Models
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models
Jiale Xu
Xintao Wang
Weihao Cheng
Yan-Pei Cao
Ying Shan
Xiaohu Qie
Shenghua Gao
236
164
0
28 Dec 2022
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large
  Language Models
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
Chan Hee Song
Jiaman Wu
Clay Washington
Brian M Sadler
Wei-Lun Chao
Yu-Chuan Su
LLMAGLM&Ro
124
417
0
08 Dec 2022
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual
  Grounding
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding
Dave Zhenyu Chen
Ronghang Hu
Xinlei Chen
Matthias Nießner
Angel X. Chang
100
54
0
01 Dec 2022
OpenScene: 3D Scene Understanding with Open Vocabularies
OpenScene: 3D Scene Understanding with Open Vocabularies
Songyou Peng
Kyle Genova
ChiyuMaxJiang
Andrea Tagliasacchi
Marc Pollefeys
Thomas Funkhouser
3DPCVLM
92
363
0
28 Nov 2022
InstructPix2Pix: Learning to Follow Image Editing Instructions
InstructPix2Pix: Learning to Follow Image Editing Instructions
Tim Brooks
Aleksander Holynski
Alexei A. Efros
DiffM
207
1,813
0
17 Nov 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
394
2,388
0
09 Nov 2022
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLMLRM
194
3,146
0
20 Oct 2022
SQA3D: Situated Question Answering in 3D Scenes
SQA3D: Situated Question Answering in 3D Scenes
Xiaojian Ma
Silong Yong
Zilong Zheng
Qing Li
Yitao Liang
Song-Chun Zhu
Siyuan Huang
LM&Ro
72
148
0
14 Oct 2022
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
Feng Liang
Bichen Wu
Xiaoliang Dai
Kunpeng Li
Yinan Zhao
Hang Zhang
Peizhao Zhang
Peter Vajda
Diana Marculescu
CLIPVLM
102
457
0
09 Oct 2022
Automatic Chain of Thought Prompting in Large Language Models
Automatic Chain of Thought Prompting in Large Language Models
Zhuosheng Zhang
Aston Zhang
Mu Li
Alexander J. Smola
ReLMLRM
150
621
0
07 Oct 2022
Imagen Video: High Definition Video Generation with Diffusion Models
Imagen Video: High Definition Video Generation with Diffusion Models
Jonathan Ho
William Chan
Chitwan Saharia
Jay Whang
Ruiqi Gao
...
Diederik P. Kingma
Ben Poole
Mohammad Norouzi
David J. Fleet
Tim Salimans
VGen
162
1,540
0
05 Oct 2022
3D VSG: Long-term Semantic Scene Change Prediction through 3D Variable
  Scene Graphs
3D VSG: Long-term Semantic Scene Change Prediction through 3D Variable Scene Graphs
Sam Looper
Javier Rodriguez Puigvert
Roland Siegwart
Cesar Cadena
L. Schmid
3DPC
51
23
0
16 Sep 2022
1234
Next