ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.18651
  4. Cited By
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding,
  Reasoning, and Planning

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

30 November 2023
Sijin Chen
Xin Chen
C. Zhang
Mingsheng Li
Gang Yu
Hao Fei
Erik Cambria
Jiayuan Fan
Tao Chen
    MLLM
ArXiv (abs)PDFHTML

Papers citing "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"

50 / 74 papers shown
Title
LEO-VL: Towards 3D Vision-Language Generalists via Data Scaling with Efficient Representation
J. Huang
Xiaojian Ma
Xiongkun Linghu
Yue Fan
Junchao He
...
Qing Li
Song-Chun Zhu
Yixin Chen
Baoxiong Jia
Siyuan Huang
82
0
0
11 Jun 2025
SpatialLM: Training Large Language Models for Structured Indoor Modeling
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Yongsen Mao
Junhao Zhong
Chuan Fang
Jia Zheng
Rui Tang
Hao Zhu
Ping Tan
Zihan Zhou
3DV
26
1
0
09 Jun 2025
Pts3D-LLM: Studying the Impact of Token Structure for 3D Scene Understanding With Large Language Models
Pts3D-LLM: Studying the Impact of Token Structure for 3D Scene Understanding With Large Language Models
Hugues Thomas
Chen Chen
Jian Zhang
46
0
0
06 Jun 2025
Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs
Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs
Haoyuan Li
Yanpeng Zhou
Yufei Gao
Tao Tang
J. N. Han
Yujie Yuan
Dave Zhenyu Chen
Jiawang Bian
Hang Xu
Xiaodan Liang
119
0
0
05 Jun 2025
From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes
Tianxu Wang
Zhuofan Zhang
Ziyu Zhu
Yue Fan
Jing Xiong
Pengxiang Li
Xiaojian Ma
Qing Li
111
0
0
05 Jun 2025
Struct2D: A Perception-Guided Framework for Spatial Reasoning in Large Multimodal Models
Struct2D: A Perception-Guided Framework for Spatial Reasoning in Large Multimodal Models
Fangrui Zhu
Hanhui Wang
Yiming Xie
Jing Gu
Tianye Ding
Jianwei Yang
Huaizu Jiang
3DVLRM
109
0
0
04 Jun 2025
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors
Duo Zheng
Shijia Huang
Yanyang Li
Liwei Wang
45
0
0
30 May 2025
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
Diankun Wu
Fangfu Liu
Yi-Hsin Hung
Yueqi Duan
LRM
87
1
0
29 May 2025
3D Question Answering via only 2D Vision-Language Models
3D Question Answering via only 2D Vision-Language Models
Fengyun Wang
Sicheng Yu
Jiawei Wu
Jinhui Tang
Hanwang Zhang
Qianru Sun
29
0
0
28 May 2025
RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction with Spatio-Temporal Aggregation
RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction with Spatio-Temporal Aggregation
Naman Patel
Prashanth Krishnamurthy
Farshad Khorrami
82
0
0
21 May 2025
AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning
AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning
Kai Zhang
Xingyu Chen
Xiaofeng Zhang
110
0
0
19 May 2025
SpatialLLM: From Multi-modality Data to Urban Spatial Intelligence
SpatialLLM: From Multi-modality Data to Urban Spatial Intelligence
Jiabin Chen
Haiping Wang
Jinpeng Li
Yuan Liu
Zhen Dong
Bisheng Yang
154
0
0
19 May 2025
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
Yang Liu
Ming Ma
Xiaomin Yu
Pengxiang Ding
Han Zhao
Mingyang Sun
Siteng Huang
Donglin Wang
LRM
207
0
0
18 May 2025
Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation
Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation
Zihan Wang
Seungjun Lee
Gim Hee Lee
VGen
134
0
0
16 May 2025
SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models
SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models
Shun Taguchi
Hideki Deguchi
Takumi Hamazaki
Hiroyuki Sakai
ReLMLRM
127
0
0
08 May 2025
3D CoCa: Contrastive Learners are 3D Captioners
3D CoCa: Contrastive Learners are 3D Captioners
Ting Huang
Zhenru Zhang
Yansen Wang
Hao Tang
96
1
0
13 Apr 2025
OmniSVG: A Unified Scalable Vector Graphics Generation Model
OmniSVG: A Unified Scalable Vector Graphics Generation Model
Yiying Yang
Wei Cheng
Sijin Chen
Xianfang Zeng
Jiaxu Zhang
Liao Wang
Gang Yu
Xingjun Ma
Xingjun Ma
Yu Jiang
VLM
127
6
0
08 Apr 2025
The Point, the Vision and the Text: Does Point Cloud Boost Spatial Reasoning of Large Language Models?
The Point, the Vision and the Text: Does Point Cloud Boost Spatial Reasoning of Large Language Models?
Weichen Zhang
Ruiying Peng
Chen Gao
Jianjie Fang
Xin Zeng
...
Ziyi Wang
Jinqiang Cui
Xin Wang
Xinlei Chen
Yongqian Li
LRM
148
3
0
06 Apr 2025
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Xiaofeng Han
Shunpeng Chen
Zenghuang Fu
Zhe Feng
Lue Fan
...
Li Guo
Weiliang Meng
Xiaopeng Zhang
Rongtao Xu
Shibiao Xu
126
4
0
03 Apr 2025
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
Haochen Wang
Yucheng Zhao
Tiancai Wang
Haoqiang Fan
Xinming Zhang
Zhaoxiang Zhang
160
4
0
02 Apr 2025
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
Jiahui Zhang
Yurui Chen
Yanpeng Zhou
Yueming Xu
Ze Huang
...
Xinyue Cai
G. Huang
Xingyue Quan
Hang Xu
Li Zhang
LRM
188
2
0
29 Mar 2025
Empowering Large Language Models with 3D Situation Awareness
Empowering Large Language Models with 3D Situation Awareness
Zhihao Yuan
Yibo Peng
Jinke Ren
Yinghong Liao
Yatong Han
Chun-Mei Feng
Hengshuang Zhao
G. Li
Shuguang Cui
Zhen Li
129
0
0
29 Mar 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
J. Huang
Baoxiong Jia
Yansen Wang
Ziyu Zhu
Xiongkun Linghu
Qing Li
Song-Chun Zhu
Siyuan Huang
177
5
0
28 Mar 2025
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
Jiaxin Huang
Runnan Chen
Ziwen Li
Zhengqing Gao
Xiao He
Yandong Guo
Mingming Gong
Tongliang Liu
LRM
109
1
0
23 Mar 2025
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
Chandan Yeshwanth
Dávid Rozenberszki
Angela Dai
145
0
0
21 Mar 2025
GraspCoT: Integrating Physical Property Reasoning for 6-DoF Grasping under Flexible Language Instructions
GraspCoT: Integrating Physical Property Reasoning for 6-DoF Grasping under Flexible Language Instructions
Xiaomeng Chu
Jiajun Deng
Guoliang You
Wei Liu
Xuzhao Li
Jianmin Ji
Yanzhe Zhang
132
0
0
20 Mar 2025
Exploring 3D Activity Reasoning and Planning: From Implicit Human Intentions to Route-Aware Planning
Exploring 3D Activity Reasoning and Planning: From Implicit Human Intentions to Route-Aware Planning
Xueying Jiang
Wenhao Li
Xiaoqin Zhang
Ling Shao
Shijian Lu
LRM
151
1
0
17 Mar 2025
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
Jiahe Zhao
Ruibing Hou
Zejie Tian
Hong Chang
Shiguang Shan
83
0
0
17 Mar 2025
Learning A Zero-shot Occupancy Network from Vision Foundation Models via Self-supervised Adaptation
Sihao Lin
Daqi Liu
Ruochong Fu
Dongrui Liu
A. Song
Hongwei Xie
Zhihui Li
Bing Wang
Xiaojun Chang
146
0
0
10 Mar 2025
SplatTalk: 3D VQA with Gaussian Splatting
Anh Thai
Songyou Peng
Kyle Genova
Leonidas Guibas
Thomas Funkhouser
3DGS
150
1
0
08 Mar 2025
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
Hanxun Yu
Wentong Li
Song Wang
Jintai Chen
Jianke Zhu
3DVLRM
161
6
0
01 Mar 2025
3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds
3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds
Hengshuo Chu
Xiang Deng
Qi Lv
Xiaoyang Chen
Yinchuan Li
Haifeng Zhang
Liqiang Nie
159
4
0
27 Feb 2025
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang
Haifeng Huang
Yuzhang Shang
Mubarak Shah
Yan Yan
102
9
0
21 Feb 2025
Understanding and Evaluating Hallucinations in 3D Visual Language Models
Understanding and Evaluating Hallucinations in 3D Visual Language Models
Ruiying Peng
Kaiyuan Li
Weichen Zhang
Chen Gao
Xinlei Chen
Yongqian Li
182
1
0
18 Feb 2025
3D-Grounded Vision-Language Framework for Robotic Task Planning: Automated Prompt Synthesis and Supervised Reasoning
3D-Grounded Vision-Language Framework for Robotic Task Planning: Automated Prompt Synthesis and Supervised Reasoning
Guoqin Tang
Qingxuan Jia
Zeyuan Huang
Gang Chen
Ning Ji
Zhipeng Yao
112
0
0
13 Feb 2025
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding
Haomiao Xiong
Yunzhi Zhuge
Jiawen Zhu
Lu Zhang
Huchuan Lu
79
3
0
14 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
177
15
0
06 Jan 2025
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Zhangyang Qi
Zhixiong Zhang
Ye Fang
Jiaqi Wang
Hengshuang Zhao
229
16
0
02 Jan 2025
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
Renqiu Xia
Mingxing Li
Hancheng Ye
Wenjie Wu
Hongbin Zhou
...
Zeang Sheng
Botian Shi
Tao Chen
Junchi Yan
Bo Zhang
198
10
0
16 Dec 2024
Chimera: Improving Generalist Model with Domain-Specific Experts
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng
Mingxing Li
Hongbin Zhou
Renqiu Xia
Renrui Zhang
...
Aojun Zhou
Botian Shi
Tao Chen
Bo Zhang
Xiangyu Yue
197
5
0
08 Dec 2024
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences
Hongyan Zhi
Peihao Chen
Junyan Li
Shuailei Ma
Xinyu Sun
Tianhang Xiang
Yinjie Lei
Mingkui Tan
Chuang Gan
176
8
0
02 Dec 2024
Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments
Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments
Sangmim Song
S. Kodagoda
A. Gunatilake
Marc G. Carmichael
Karthick Thiyagarajan
Jodi Martin
LM&Ro
163
1
0
28 Oct 2024
Synergistic Dual Spatial-aware Generation of Image-to-Text and
  Text-to-Image
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
Yu Zhao
Hao Fei
Xiangtai Li
L. Qin
Jiayi Ji
Erik Cambria
Meishan Zhang
Hao Fei
Jianguo Wei
DiffM
103
1
0
20 Oct 2024
Trans4D: Realistic Geometry-Aware Transition for Compositional
  Text-to-4D Synthesis
Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis
Bohan Zeng
Ling Yang
Siyu Li
Jiaming Liu
Zixiang Zhang
...
Yongzhen Guo
Fu-Yun Wang
Minkai Xu
Stefano Ermon
Wentao Zhang
VGenAI4CE
88
9
0
09 Oct 2024
IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice
  Interaction Abilities
IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities
Xin Zhang
Xiang Lyu
Zhihao Du
Qian Chen
Dong Zhang
...
Yuxuan Wang
Bin Zhang
Heng Lu
Yaqian Zhou
Xipeng Qiu
AuLLM
109
9
0
09 Oct 2024
OrionNav: Online Planning for Robot Autonomy with Context-Aware LLM and
  Open-Vocabulary Semantic Scene Graphs
OrionNav: Online Planning for Robot Autonomy with Context-Aware LLM and Open-Vocabulary Semantic Scene Graphs
Venkata Naren Devarakonda
Raktim Gautam Goswami
Ali Umut Kaypak
Naman Patel
Rooholla Khorrambakht
Prashanth Krishnamurthy
Farshad Khorrami
LM&Ro
103
7
0
08 Oct 2024
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models
Yue Zhang
Zhiyang Xu
Ying Shen
Parisa Kordjamshidi
Lifu Huang
131
8
0
04 Oct 2024
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Chenming Zhu
Tai Wang
Wenwei Zhang
Jiangmiao Pang
Xihui Liu
248
52
0
26 Sep 2024
Multi-modal Situated Reasoning in 3D Scenes
Multi-modal Situated Reasoning in 3D Scenes
Xiongkun Linghu
Jiangyong Huang
Xuesong Niu
Xiaojian Ma
Baoxiong Jia
Siyuan Huang
119
19
0
04 Sep 2024
More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding
More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding
Yuan Tang
Xu Han
Xianzhi Li
Qiao Yu
Jinfeng Xu
Yixue Hao
Long Hu
Min Chen
169
3
0
28 Aug 2024
12
Next