ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.23747
  4. Cited By

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

29 May 2025
Diankun Wu
Fangfu Liu
Yi-Hsin Hung
Yueqi Duan
    LRM
ArXiv (abs)PDFHTML

Papers citing "Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence"

10 / 10 papers shown
Title
AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models
AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models
Zheda Mai
A. Chowdhury
Zihe Wang
Sooyoung Jeon
Lemeng Wang
Jiacheng Hou
Jihyung Kil
Wei-Lun Chao
CoGe
47
0
0
10 Jun 2025
Transfer between Modalities with MetaQueries
Transfer between Modalities with MetaQueries
Xichen Pan
Satya Narayan Shukla
Aashu Singh
Zhuokai Zhao
Shlok Kumar Mishra
...
Jiuhai Chen
Kunpeng Li
F. Xu
Ji Hou
Saining Xie
DiffM
99
21
0
08 Apr 2025
STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
Yongbin Li
Yize Zhang
Tao Lin
Xiangrui Liu
Wenxiao Cai
Zhengyang Liang
Bo Zhao
LRM
119
9
0
31 Mar 2025
Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models
Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models
Jianing Qi
Jiawei Liu
Hao Tang
Zhigang Zhu
156
4
0
21 Mar 2025
TULIP: Towards Unified Language-Image Pretraining
TULIP: Towards Unified Language-Image Pretraining
Zineng Tang
Long Lian
Seun Eisape
Xudong Wang
Roei Herzig
Adam Yala
Alane Suhr
Trevor Darrell
David M. Chan
VLMCLIPMLLM
191
7
0
19 Mar 2025
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos
Peiran Wu
Yunze Liu
Chonghan Liu
Miao Liu
VGenLRM
120
7
0
16 Mar 2025
Qwen2.5-VL Technical Report
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
405
699
0
20 Feb 2025
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Zhangyang Qi
Zhixiong Zhang
Ye Fang
Jiaqi Wang
Hengshuang Zhao
213
16
0
02 Jan 2025
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Chenming Zhu
Tai Wang
Wenwei Zhang
Jiangmiao Pang
Xihui Liu
239
52
0
26 Sep 2024
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Zuyan Liu
Yuhao Dong
Ziwei Liu
Winston Hu
Jiwen Lu
Yongming Rao
ObjD
200
72
0
19 Sep 2024
1