ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.14171
  4. Cited By
Thinking in Space: How Multimodal Large Language Models See, Remember,
  and Recall Spaces

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

18 December 2024
Jihan Yang
Shusheng Yang
Anjali W. Gupta
Rilyn Han
Li Fei-Fei
Saining Xie
    LRM
ArXivPDFHTML

Papers citing "Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces"

20 / 20 papers shown
Title
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
Yang Liu
Ming Ma
Xiaomin Yu
Pengxiang Ding
Han Zhao
Mingyang Sun
Siteng Huang
Donglin Wang
LRM
25
0
0
18 May 2025
SITE: towards Spatial Intelligence Thorough Evaluation
SITE: towards Spatial Intelligence Thorough Evaluation
Wei Wang
Reuben Tan
Pengyue Zhu
Jianwei Yang
Zhengyuan Yang
Lijuan Wang
Andrey Kolobov
Jianfeng Gao
Boqing Gong
52
0
0
08 May 2025
DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes
DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes
S. Linok
Vadim Semenov
Anastasia Trunova
Oleg Bulichev
Dmitry A. Yudin
52
0
0
06 May 2025
Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Gabriel Sarch
Balasaravanan Thoravi Kumaravel
Sahithya Ravi
Vibhav Vineet
A. D. Wilson
215
0
0
02 May 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Z. Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLM
VLM
70
19
1
14 Apr 2025
GeoNav: Empowering MLLMs with Explicit Geospatial Reasoning Abilities for Language-Goal Aerial Navigation
GeoNav: Empowering MLLMs with Explicit Geospatial Reasoning Abilities for Language-Goal Aerial Navigation
Haotian Xu
Yue Hu
Chen Gao
Zhengqiu Zhu
Yong Zhao
Yong Li
Quanjun Yin
39
0
0
13 Apr 2025
Kimi-VL Technical Report
Kimi-VL Technical Report
Kimi Team
Angang Du
B. Yin
Bowei Xing
Bowen Qu
...
Zhiqi Huang
Zihao Huang
Zijia Zhao
Zhengzhang Chen
Zongyu Lin
MLLM
VLM
MoE
219
4
0
10 Apr 2025
SpaceR: Reinforcing MLLMs in Video Spatial Reasoning
SpaceR: Reinforcing MLLMs in Video Spatial Reasoning
Kun Ouyang
Yuanxin Liu
Haoning Wu
Yi Liu
Hao Zhou
Jie Zhou
Fandong Meng
Xu Sun
LRM
33
0
0
02 Apr 2025
Improved Visual-Spatial Reasoning via R1-Zero-Like Training
Improved Visual-Spatial Reasoning via R1-Zero-Like Training
Zhenyi Liao
Qingsong Xie
Yanhao Zhang
Zijian Kong
Haonan Lu
Zhenyu Yang
Zhijie Deng
ReLM
VLM
LRM
101
0
1
01 Apr 2025
STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
Heng Chang
Yujie Zhang
Tao Lin
Xiangrui Liu
Wenxiao Cai
Zhengyang Liang
Bo Zhao
LRM
58
3
0
31 Mar 2025
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
Jiahui Zhang
Yurui Chen
Yanpeng Zhou
Yueming Xu
Ze Huang
...
Xinyue Cai
G. Huang
Xingyue Quan
Hang Xu
Li Zhang
LRM
100
0
0
29 Mar 2025
Video-R1: Reinforcing Video Reasoning in MLLMs
Video-R1: Reinforcing Video Reasoning in MLLMs
Kaituo Feng
Kaixiong Gong
Yangqiu Song
Zonghao Guo
Yibing Wang
Tianshuo Peng
Junfei Wu
Xiaoying Zhang
Benyou Wang
Xiangyu Yue
AI4TS
SyDa
LRM
54
20
0
27 Mar 2025
Grounded Chain-of-Thought for Multimodal Large Language Models
Grounded Chain-of-Thought for Multimodal Large Language Models
Qiong Wu
Xiangcong Yang
Yiyi Zhou
Chenxin Fang
Baiyang Song
Xiaoshuai Sun
Rongrong Ji
LRM
95
1
0
17 Mar 2025
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
Peiran Wu
Yunze Liu
Chonghan Liu
Miao Liu
VGen
LRM
59
2
0
16 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Yansen Wang
Shengqiong Wu
Yujie Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
95
11
0
16 Mar 2025
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
Zehua Wang
Yurui Dong
Fuwen Luo
Minyuan Ruan
Zhili Cheng
Chong Chen
Peng Li
Yang Liu
LRM
89
0
0
13 Mar 2025
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
Yuheng Ji
Huajie Tan
Jiayu Shi
Xiaoshuai Hao
Yuan Zhang
...
Huaihai Lyu
Xiaolong Zheng
Jiaming Liu
Zhongyuan Wang
Shanghang Zhang
102
8
0
28 Feb 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi Ma
OffRL
70
63
0
28 Jan 2025
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Yilun Zhao
Lujing Xie
Haowei Zhang
Guo Gan
Yitao Long
...
Xiangru Tang
Zhenwen Liang
Y. Liu
Chen Zhao
Arman Cohan
61
5
0
21 Jan 2025
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Zhangyang Qi
Zhixiong Zhang
Ye Fang
Jiaqi Wang
Hengshuang Zhao
88
7
0
02 Jan 2025
1