ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17015
  4. Cited By
Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

22 May 2025
Runsen Xu
Weiyao Wang
Hao Tang
Xingyu Chen
Xiaodong Wang
Fu-Jen Chu
Dahua Lin
Matt Feiszli
Kevin J. Liang
    LRM
ArXivPDFHTML

Papers citing "Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models"

19 / 19 papers shown
Title
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
Sihan Yang
Runsen Xu
Yiman Xie
Sizhe Yang
Mo Li
...
Haodong Duan
Xiangyu Yue
Dahua Lin
Tai Wang
Jiangmiao Pang
VLM
LRM
43
0
0
29 May 2025
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
Jianing Yang
Alexander Sax
Kevin J. Liang
Mikael Henaff
Hao Tang
Ang Cao
J. Chai
Franziska Meier
Matt Feiszli
3DGS
143
26
0
23 Jan 2025
Continuous 3D Perception Model with Persistent State
Continuous 3D Perception Model with Persistent State
Qianqian Wang
Yifei Zhang
Aleksander Holyñski
Alexei A. Efros
Angjoo Kanazawa
VGen
105
41
0
21 Jan 2025
DriveLM: Driving with Graph Visual Question Answering
DriveLM: Driving with Graph Visual Question Answering
Chonghao Sima
Katrin Renz
Kashyap Chitta
Lawrence Yunliang Chen
Hanxue Zhang
Chengen Xie
Jens Beißwenger
Ping Luo
Andreas Geiger
Hongyang Li
177
195
0
17 Jan 2025
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
151
15
0
25 Nov 2024
An Empirical Analysis on Spatial Reasoning Capabilities of Large
  Multimodal Models
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models
Fatemeh Shiri
Xiao-Yu Guo
Mona Golestan Far
Xin-Yao Yu
Gholamreza Haffari
Yuan-Fang Li
LRM
53
17
0
09 Nov 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal
  Models with Open-Source Suites
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
VLM
99
605
0
25 Apr 2024
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
Mu Hu
Wei Yin
C. Zhang
Zhipeng Cai
Xiaoxiao Long
Kaixuan Wang
Kaixuan Wang
Gang Yu
Chunhua Shen
Shaojie Shen
3DGS
251
132
0
22 Mar 2024
MathVista: Evaluating Mathematical Reasoning of Foundation Models in
  Visual Contexts
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu
Hritik Bansal
Tony Xia
Jiacheng Liu
Chun-yue Li
Hannaneh Hajishirzi
Hao Cheng
Kai-Wei Chang
Michel Galley
Jianfeng Gao
LRM
MLLM
100
613
0
03 Oct 2023
Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine
  Perception
Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine Perception
Xiaqing Pan
Nicholas Charron
Yongqiang Yang
Scott Peters
Thomas Whelan
Chen Kong
Omkar M. Parkhi
Richard Newcombe
C. Ren
VGen
57
62
0
10 Jun 2023
InstructBLIP: Towards General-purpose Vision-Language Models with
  Instruction Tuning
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLM
VLM
97
2,049
0
11 May 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
477
4,725
0
17 Apr 2023
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
148
3,444
0
16 Oct 2022
Emergent Abilities of Large Language Models
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
269
2,468
0
15 Jun 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
356
3,532
0
29 Apr 2022
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
866
29,341
0
26 Feb 2021
VizWiz Grand Challenge: Answering Visual Questions from Blind People
VizWiz Grand Challenge: Answering Visual Questions from Blind People
Danna Gurari
Qing Li
Abigale Stangl
Anhong Guo
Chi Lin
Kristen Grauman
Jiebo Luo
Jeffrey P. Bigham
CoGe
86
847
0
22 Feb 2018
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
Angela Dai
Angel X. Chang
Manolis Savva
Maciej Halber
Thomas Funkhouser
Matthias Nießner
3DPC
3DV
461
4,057
0
14 Feb 2017
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary
  Visual Reasoning
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
289
2,374
0
20 Dec 2016
1