Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.00476
Cited By
TempCompass: Do Video LLMs Really Understand Videos?
1 March 2024
Yuanxin Liu
Shicheng Li
Yi Liu
Yuxiang Wang
Shuhuai Ren
Lei Li
Sishuo Chen
Xu Sun
Lu Hou
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TempCompass: Do Video LLMs Really Understand Videos?"
50 / 85 papers shown
Title
Investigating and Enhancing the Robustness of Large Multimodal Models Against Temporal Inconsistency
Jiafeng Liang
Shixin Jiang
Xuan Dong
Ning Wang
Zheng Chu
Hui Su
Jinlan Fu
Ming Liu
See-Kiong Ng
Bing Qin
9
0
0
20 May 2025
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation
Wentao Ma
Weiming Ren
Yiming Jia
Zhuofeng Li
Ping Nie
Ge Zhang
Wenhu Chen
7
0
0
20 May 2025
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
Xuannan Liu
Zekun Li
Zheqi He
Peipei Li
Shuhan Xia
Xing Cui
Huaibo Huang
Xi Yang
Ran He
EGVM
AAML
28
0
0
17 May 2025
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
Haibo Wang
Bo Feng
Zhengfeng Lai
Mingze Xu
Shiyu Li
Weifeng Ge
Afshin Dehghan
Meng Cao
Ping Huang
OffRL
57
0
0
08 May 2025
SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding
Yiming Lei
Chenkai Zhang
Ziqiang Liu
Haitao Leng
Shaoguo Liu
Tingting Gao
Qingjie Liu
Yunhong Wang
AI4TS
56
0
0
30 Apr 2025
VEU-Bench: Towards Comprehensive Understanding of Video Editing
Bozheng Li
Y. Wu
Yi Lu
Jiashuo Yu
Licheng Tang
Jiawang Cao
Wenqing Zhu
Yuyang Sun
Jay Wu
Wenbo Zhu
39
0
0
24 Apr 2025
ZipR1: Reinforcing Token Sparsity in MLLMs
Feng Chen
Yefei He
Lequan Lin
Qingbin Liu
Bohan Zhuang
Qi Wu
51
0
0
23 Apr 2025
VideoVista-CulturalLingo: 360
∘
^\circ
∘
Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension
Xinyu Chen
Yunxin Li
Haoyuan Shi
Baotian Hu
Wenhan Luo
Yaowei Wang
Hao Fei
ELM
67
0
0
23 Apr 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
62
0
0
20 Apr 2025
VideoPASTA: 7K Preference Pairs That Matter for Video-LLM Alignment
Yogesh Kulkarni
Pooyan Fazli
38
0
0
18 Apr 2025
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
Haojian Huang
Haodong Chen
Shengqiong Wu
Meng Luo
Jinlan Fu
Xinya Du
Hao Zhang
Hao Fei
AI4TS
205
1
0
17 Apr 2025
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Pritam Sarkar
Ali Etemad
38
0
0
16 Apr 2025
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Yang Shi
Jiaheng Liu
Yushuo Guan
Zhikai Wu
Yujie Zhang
...
Bohan Zeng
Wei Zhang
Fuzheng Zhang
Wenjing Yang
Di Zhang
VGen
VLM
73
0
0
14 Apr 2025
VideoAds for Fast-Paced Video Understanding: Where Opensource Foundation Models Beat GPT-4o & Gemini-1.5 Pro
Zheyuan Zhang
Monica Dou
Linkai Peng
Hongyi Pan
Ulas Bagci
Boqing Gong
VLM
61
0
0
12 Apr 2025
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding
Yangliu Hu
Zikai Song
Na Feng
Yawei Luo
Junqing Yu
Yi-Ping Phoebe Chen
Wei Yang
33
0
0
10 Apr 2025
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation
Hao Du
Bo Wu
Yan Lu
Zhendong Mao
29
0
0
08 Apr 2025
PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
Xinpeng Ding
Kaipeng Zhang
Jinahua Han
Lanqing Hong
Hang Xu
Xuelong Li
MLLM
VLM
239
0
0
08 Apr 2025
InstructionBench: An Instructional Video Understanding Benchmark
Haiwan Wei
Yitian Yuan
Xiaohan Lan
Wei Ke
Lin Ma
ELM
46
0
0
07 Apr 2025
SmolVLM: Redefining small and efficient multimodal models
Andres Marafioti
Orr Zohar
Miquel Farré
Merve Noyan
Elie Bakouch
...
Hugo Larcher
Mathieu Morlon
Lewis Tunstall
Leandro von Werra
Thomas Wolf
VLM
44
8
0
07 Apr 2025
VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models
Dahun Kim
A. Piergiovanni
Ganesh Mallya
A. Angelova
CoGe
41
0
0
04 Apr 2025
Slow-Fast Architecture for Video Multi-Modal Large Language Models
Min Shi
Shihao Wang
Chieh-Yun Chen
Jitesh Jain
Kai Wang
Junjun Xiong
Guilin Liu
Zhiding Yu
Humphrey Shi
40
2
0
02 Apr 2025
H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding
Qi Wu
Quanlong Zheng
Yanhao Zhang
Junlin Xie
Jinguo Luo
...
Peng Liu
Qingsong Xie
Ru Zhen
Haonan Lu
Zhenyu Yang
VLM
62
0
0
31 Mar 2025
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1
Yi Chen
Yuying Ge
Rui Wang
Yixiao Ge
Lu Qiu
Ying Shan
Xihui Liu
ReLM
VLM
OffRL
LRM
64
2
0
31 Mar 2025
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury
Hanan Gani
Nishit Anand
Sayan Nag
Ruohan Gao
Mohamed Elhoseiny
Salman Khan
Dinesh Manocha
LRM
54
0
0
29 Mar 2025
Video-R1: Reinforcing Video Reasoning in MLLMs
Kaituo Feng
Kaixiong Gong
Yangqiu Song
Zonghao Guo
Yibing Wang
Tianshuo Peng
Junfei Wu
Xiaoying Zhang
Benyou Wang
Xiangyu Yue
AI4TS
SyDa
LRM
54
14
0
27 Mar 2025
Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering
Erika Mori
Yue Qiu
Hirokatsu Kataoka
Y. Aoki
55
0
0
27 Mar 2025
ACVUBench: Audio-Centric Video Understanding Benchmark
Yuqing Yang
Jimin Zhuang
Guangzhi Sun
Changli Tang
Yong Li
P. Li
Yifan Jiang
W. Li
Z. Ma
Chao Zhang
AuLLM
CoGe
73
0
0
25 Mar 2025
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
Xiangrui Liu
Yan Shu
Zhengyang Liang
Ao Li
Yang Tian
Bo Zhao
VGen
VLM
100
1
0
24 Mar 2025
Can Text-to-Video Generation help Video-Language Alignment?
Luca Zanella
Massimiliano Mancini
Willi Menapace
Sergey Tulyakov
Yiming Wang
Elisa Ricci
DiffM
VGen
60
0
0
24 Mar 2025
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
Mingze Xu
Mingfei Gao
Shiyu Li
Jiasen Lu
Zhe Gan
Zhengfeng Lai
Meng Cao
Kai Kang
Yuqing Yang
Afshin Dehghan
59
2
0
24 Mar 2025
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
Wenxuan Zhu
Bing Li
Cheng Zheng
Jinjie Mai
Jun-Cheng Chen
...
Abdullah Hamdi
Sara Rojas Martinez
Chia-Wen Lin
Mohamed Elhoseiny
Bernard Ghanem
VLM
48
0
0
22 Mar 2025
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Keda Tao
Haoxuan You
Yang Sui
Can Qin
Haoyu Wang
VLM
MQ
91
0
0
20 Mar 2025
Impossible Videos
Zechen Bai
Hai Ci
Mike Zheng Shou
EGVM
VGen
72
0
0
18 Mar 2025
Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos
Chiara Plizzari
A. Tonioni
Yongqin Xian
Achin Kulshrestha
F. Tombari
EgoV
61
0
0
17 Mar 2025
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
Zixu Cheng
Jian Hu
Ziquan Liu
Chenyang Si
Wei Li
Shaogang Gong
LRM
78
2
0
14 Mar 2025
UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?
Yuanxin Liu
Rui Zhu
Shuhuai Ren
Jiacong Wang
Haoyuan Guo
Xu Sun
Lu Jiang
178
1
0
13 Mar 2025
TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs
Yunxiao Wang
Meng Liu
Rui Shao
Haoyu Zhang
Bin Wen
Fan Yang
Tingting Gao
Di Zhang
Liqiang Nie
70
1
0
13 Mar 2025
VRoPE: Rotary Position Embedding for Video Large Language Models
Zikang Liu
Longteng Guo
Yepeng Tang
Tongtian Yue
Junxian Cai
Kai Ma
Qingbin Liu
Xi Chen
Jing Liu
49
0
0
17 Feb 2025
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Guangzhi Sun
Yudong Yang
Jimin Zhuang
Changli Tang
Yong Li
W. Li
Z. Ma
Chao Zhang
LRM
MLLM
VLM
66
4
0
17 Feb 2025
Unhackable Temporal Rewarding for Scalable Video MLLMs
En Yu
Kangheng Lin
Liang Zhao
Yana Wei
Zining Zhu
...
Jianjian Sun
Zheng Ge
Xinsong Zhang
Jingyu Wang
Wenbing Tao
64
4
0
17 Feb 2025
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Yilun Zhao
Lujing Xie
Haowei Zhang
Guo Gan
Yitao Long
...
Xiangru Tang
Zhenwen Liang
Y. Liu
Chen Zhao
Arman Cohan
58
5
0
21 Jan 2025
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Wenyi Hong
Yean Cheng
Zhiyong Yang
Weihan Wang
Lefan Wang
Xiaotao Gu
Shiyu Huang
Yuxiao Dong
J. Tang
CoGe
VLM
75
4
0
06 Jan 2025
MLVU: Benchmarking Multi-task Long Video Understanding
Yueze Wang
Yan Shu
Bo Zhao
Boya Wu
Junjie Zhou
...
Xi Yang
Y. Xiong
Bo Zhang
Tiejun Huang
Zheng Liu
VLM
63
33
0
03 Jan 2025
SCBench: A Sports Commentary Benchmark for Video LLMs
Kuangzhi Ge
L. Chen
Kevin Zhang
Yulin Luo
Tianyu Shi
Liaoyuan Fan
Xiang Li
Guanqun Wang
Shanghang Zhang
46
0
0
23 Dec 2024
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Jihan Yang
Shusheng Yang
Anjali W. Gupta
Rilyn Han
Li Fei-Fei
Saining Xie
LRM
126
56
0
18 Dec 2024
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Pan Zhang
Xiaoyi Dong
Yuhang Cao
Yuhang Zang
Rui Qian
...
Xinsong Zhang
K. Chen
Yu Qiao
Dahua Lin
Jiaqi Wang
KELM
84
12
0
12 Dec 2024
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
Haozhao Wang
Yuxiang Nie
Yongjie Ye
Deng GuanYu
Yanjie Wang
Shuai Li
Haiyang Yu
Jinghui Lu
Can Huang
VLM
MLLM
84
1
0
12 Dec 2024
TimeRefine: Temporal Grounding with Time Refining Video LLM
Xizi Wang
Feng Cheng
Ziyang Wang
Huiyu Wang
Md. Mohaiminul Islam
Lorenzo Torresani
Joey Tianyi Zhou
Gedas Bertasius
David J. Crandall
109
1
0
12 Dec 2024
Progress-Aware Video Frame Captioning
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
100
1
0
03 Dec 2024
PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Meng Cao
Haoran Tang
Haoze Zhao
Hangyu Guo
Jing Liu
Ge Zhang
Ruyang Liu
Qiang Sun
Ian Reid
Xiaodan Liang
100
2
0
02 Dec 2024
1
2
Next