Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.06157
Cited By
UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces
8 March 2025
Baining Zhao
Jianjie Fang
Zichao Dai
Ziyi Wang
Jirong Zha
Weichen Zhang
Chen Gao
Yijiao Wang
Jinqiang Cui
Xinlei Chen
Yongqian Li
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces"
21 / 21 papers shown
Title
GeoNav: Empowering MLLMs with Explicit Geospatial Reasoning Abilities for Language-Goal Aerial Navigation
Haotian Xu
Yue Hu
Chen Gao
Zhengqiu Zhu
Yong Zhao
Yongqian Li
Quanjun Yin
107
2
0
13 Apr 2025
Understanding World or Predicting Future? A Comprehensive Survey of World Models
Jingtao Ding
Yunke Zhang
Yu Shang
Yuheng Zhang
Zefang Zong
...
Hongyuan Su
Nian Li
Nicholas Sukiennik
Fengli Xu
Yong Li
SyDa
VGen
105
16
0
21 Nov 2024
Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions
Qingbin Zeng
Qinglong Yang
Shunan Dong
Heming Du
Liang Zheng
Fengli Xu
Yong Li
LLMAG
LM&Ro
79
14
0
08 Aug 2024
AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding
Alessandro Suglia
Claudio Greco
Katie Baker
Jose L. Part
Ioannis Papaioannou
Arash Eshghi
Ioannis Konstas
Oliver Lemon
69
11
0
19 Jun 2024
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Chaoyou Fu
Yuhan Dai
Yondong Luo
Lei Li
Shuhuai Ren
...
Xiawu Zheng
Enhong Chen
Caifeng Shan
Xing Sun
Xing Sun
VLM
MLLM
154
418
0
31 May 2024
STAR: A Benchmark for Situated Reasoning in Real-World Videos
Bo Wu
Shoubin Yu
Zhenfang Chen
Joshua B. Tenenbaum
Chuang Gan
128
195
0
15 May 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
VLM
115
637
0
25 Apr 2024
MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception
Yuhao Wang
Yusheng Liao
Heyang Liu
Hongcheng Liu
Yu Wang
Yanfeng Wang
LRM
VLM
65
14
0
15 Jan 2024
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
VLM
168
99
0
29 Dec 2023
Urban Generative Intelligence (UGI): A Foundational Platform for Agents in Embodied City Environment
Fengli Xu
Jun Zhang
Chen Gao
J. Feng
Yong Li
AI4CE
LLMAG
75
30
0
19 Dec 2023
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
K. Mangalam
Raiymbek Akshulakov
Jitendra Malik
113
308
0
17 Aug 2023
Large Multimodal Models: Notes on CVPR 2023 Tutorial
Chunyuan Li
MLLM
VLM
78
20
0
26 Jun 2023
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
Arjun Majumdar
Karmesh Yadav
Sergio Arnaud
Yecheng Jason Ma
Claire Chen
...
Dhruv Batra
Yixin Lin
Oleksandr Maksymets
Aravind Rajeswaran
Franziska Meier
LM&Ro
63
184
0
31 Mar 2023
R3M: A Universal Visual Representation for Robot Manipulation
Suraj Nair
Aravind Rajeswaran
Vikash Kumar
Chelsea Finn
Abhi Gupta
LM&Ro
101
584
0
23 Mar 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
401
1,109
0
13 Oct 2021
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
490
10,496
0
17 Jun 2021
Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark
Longyin Wen
Dawei Du
Pengfei Zhu
Q. Hu
Qilong Wang
Liefeng Bo
Siwei Lyu
91
94
0
06 May 2021
Embodied Intelligence via Learning and Evolution
Agrim Gupta
Silvio Savarese
Surya Ganguli
Li Fei-Fei
AI4CE
91
251
0
03 Feb 2021
Human-centric Spatio-Temporal Video Grounding With Visual Transformers
Zongheng Tang
Yue Liao
Si Liu
Guanbin Li
Xiaojie Jin
Hongxu Jiang
Qian Yu
Dong Xu
66
99
0
10 Nov 2020
Object Goal Navigation using Goal-Oriented Semantic Exploration
Devendra Singh Chaplot
Dhiraj Gandhi
Abhinav Gupta
Ruslan Salakhutdinov
94
523
0
01 Jul 2020
Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks
Fengda Zhu
Yi Zhu
Xiaojun Chang
Xiaodan Liang
LRM
70
242
0
18 Nov 2019
1