Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.17005
Cited By
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
28 November 2023
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Yi Liu
Zun Wang
Jilan Xu
Guo Chen
Ping Luo
Limin Wang
Yu Qiao
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MVBench: A Comprehensive Multi-modal Video Understanding Benchmark"
50 / 313 papers shown
Title
Visual Context Window Extension: A New Perspective for Long Video Understanding
Hongchen Wei
Zhenzhong Chen
VLM
34
6
0
30 Sep 2024
Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval
Yabing Wang
Le Wang
Qiang-feng Zhou
Zhibin Wang
Hao Li
Gang Hua
Wei Tang
33
7
0
30 Sep 2024
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs
Zicheng Zhang
Ziheng Jia
H. Wu
Chunyi Li
Zijian Chen
...
Wei Sun
Xiaohong Liu
Xiongkuo Min
Weisi Lin
Guangtao Zhai
32
7
0
30 Sep 2024
Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding
Xiao Wang
Jianlong Wu
Zijia Lin
Fuzheng Zhang
Di Zhang
Liqiang Nie
VGen
37
1
0
29 Sep 2024
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
Heqing Zou
Tianze Luo
Guiyang Xie
Victor
Zhang
...
Guangcong Wang
Juanyang Chen
Zhuochen Wang
Hansheng Zhang
Huaijian Zhang
VLM
34
6
0
27 Sep 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Ye Liu
Zongyang Ma
Zhongang Qi
Yang Wu
Ying Shan
Chang Wen Chen
39
16
0
26 Sep 2024
MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning
Tieyuan Chen
Huabin Liu
Tianyao He
Yihang Chen
Chaofan Gan
...
Cheng Zhong
Yang Zhang
Yingxue Wang
Hui Lin
Weiyao Lin
VGen
CML
39
5
0
26 Sep 2024
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
Chenming Zhu
Tai Wang
Wenwei Zhang
Jiangmiao Pang
Xihui Liu
134
32
0
26 Sep 2024
EventHallusion: Diagnosing Event Hallucinations in Video LLMs
Jiacheng Zhang
Yang Jiao
Shaoxiang Chen
Jingjing Chen
Zhiyu Tan
Hao Li
Jingjing Chen
MLLM
61
18
0
25 Sep 2024
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Hong Chen
Xin Wang
Yuwei Zhou
Bin Huang
Yipeng Zhang
Wei Feng
Houlun Chen
Zeyang Zhang
Siao Tang
Wenwu Zhu
DiffM
55
7
0
23 Sep 2024
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Yan Shu
Peitian Zhang
Zheng Liu
Minghao Qin
Yueze Wang
Tiejun Huang
Bo Zhao
VLM
52
41
0
22 Sep 2024
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
Lin Li
Guikun Chen
Hanrong Shi
Jun Xiao
Long Chen
42
9
0
21 Sep 2024
Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner
Yuzhang Shang
Bingxin Xu
Weitai Kang
Mu Cai
Yuheng Li
Zehao Wen
Zhen Dong
Kurt Keutzer
Yong Jae Lee
Yan Yan
38
7
0
19 Sep 2024
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Zuyan Liu
Yuhao Dong
Ziwei Liu
Winston Hu
Jiwen Lu
Yongming Rao
ObjD
86
55
0
19 Sep 2024
SoccerNet 2024 Challenges Results
A. Cioppa
Silvio Giancola
Vladimir Somers
Victor Joos
Floriane Magera
...
Yuan Li
Yuting Yang
Yuxuan Xiao
Zehua Cheng
Zhihao Li
31
2
0
16 Sep 2024
Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU
Zhenyu Ning
Jieru Zhao
Qihao Jin
Wenchao Ding
Minyi Guo
29
5
0
11 Sep 2024
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Dingxin Cheng
Mingda Li
Jingyu Liu
Yongxin Guo
Bin Jiang
Qingbin Liu
Xi Chen
Bo Zhao
38
4
0
10 Sep 2024
Question-Answering Dense Video Events
Hangyu Qin
Junbin Xiao
Angela Yao
VLM
77
1
0
06 Sep 2024
StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models
Y. Guo
Faizan Siddiqui
Yang Zhao
Rama Chellappa
Shao-Yuan Lo
LRM
44
2
0
31 Aug 2024
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Shiwei Wu
Joya Chen
Kevin Qinghong Lin
Qimeng Wang
Yan Gao
Qianli Xu
Tong Xu
Yao Hu
Enhong Chen
Mike Zheng Shou
VLM
52
12
0
29 Aug 2024
Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding
Kaijing Ma
Haojian Huang
Jin Chen
Haodong Chen
Pengliang Ji
...
Han Fang
Chao Ban
Hao Sun
Mulin. Chen
Xuelong Li
37
7
0
29 Aug 2024
A Survey on Evaluation of Multimodal Large Language Models
Jiaxing Huang
Jingyi Zhang
LM&MA
ELM
LRM
50
20
0
28 Aug 2024
Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
Qirui Chen
Shangzhe Di
Weidi Xie
29
12
0
26 Aug 2024
I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
Yiwei Ma
Jiayi Ji
Ke Ye
Weihuang Lin
Zhibin Wang
Yonghan Zheng
Qiang-feng Zhou
Xiaoshuai Sun
Rongrong Ji
46
5
0
26 Aug 2024
Multimodal Datasets and Benchmarks for Reasoning about Dynamic Spatio-Temporality in Everyday Environments
Takanori Ugai
Kensho Hara
S. Egami
Ken Fukuda
VGen
30
0
0
21 Aug 2024
SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition
Zebang Cheng
Shuyuan Tu
Dawei Huang
Minghan Li
Xiaojiang Peng
Zhi-Qi Cheng
Alexander G. Hauptmann
53
2
0
20 Aug 2024
Quality Assessment in the Era of Large Models: A Survey
Zicheng Zhang
Yingjie Zhou
Chunyi Li
Baixuan Zhao
Xiaohong Liu
Guangtao Zhai
47
10
0
17 Aug 2024
Continuous Perception Benchmark
Zeyu Wang
Zhenzhen Weng
Serena Yeung-Levy
VLM
39
0
0
15 Aug 2024
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Jiabo Ye
Haiyang Xu
Haowei Liu
Anwen Hu
Ming Yan
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
51
98
0
09 Aug 2024
VideoQA in the Era of LLMs: An Empirical Study
Junbin Xiao
Nanxin Huang
Hangyu Qin
Dongyang Li
Yicong Li
...
Zhulin Tao
Jianxing Yu
Liang Lin
Tat-Seng Chua
Angela Yao
28
10
0
08 Aug 2024
A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks
Jiaqi Wang
Hanqi Jiang
Yi-Hsueh Liu
Chong Ma
Xu-Yao Zhang
...
Xin Zhang
Wei Zhang
Dinggang Shen
Tianming Liu
Shu Zhang
VLM
AI4TS
54
32
0
02 Aug 2024
SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models
Zheng Liu
Hao Liang
Xijie Huang
Wentao Xiong
Qinhan Yu
Linzhuang Sun
Chong Chen
Conghui He
Bin Cui
Wentao Zhang
SyDa
55
0
0
30 Jul 2024
MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues
Liyun Zhang
38
1
0
23 Jul 2024
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Mingze Xu
Mingfei Gao
Zhe Gan
Hong-You Chen
Zhengfeng Lai
Haiming Gang
Kai Kang
Afshin Dehghan
62
49
0
22 Jul 2024
WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding
Quan Kong
Yuki Kawana
Rajat Saini
Ashutosh Kumar
Jingjing Pan
...
Yohei Ozao
Balázs Opra
D. Anastasiu
Yoichi Sato
Norimasa Kobori
VGen
41
8
0
22 Jul 2024
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Junming Yang
Junming Yang
Xinyu Fang
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MA
VLM
39
115
0
16 Jul 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
60
5
0
11 Jul 2024
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding
Minghui Wu
Chenxu Zhao
Anyang Su
Donglin Di
Tianyu Fu
...
Min He
Ya Gao
Meng Ma
Kun Yan
Ping Wang
32
0
0
11 Jul 2024
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Feng Li
Renrui Zhang
Hao Zhang
Yuanhan Zhang
Bo Li
Wei Li
Zejun Ma
Chunyuan Li
MLLM
VLM
52
198
0
10 Jul 2024
AffectGPT: Dataset and Framework for Explainable Multimodal Emotion Recognition
Zheng Lian
Haiyang Sun
Guoying Zhao
Jiangyan Yi
Bin Liu
Jianhua Tao
50
2
0
10 Jul 2024
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Tiancheng Zhao
Qianqian Zhang
Kyusong Lee
Peng Liu
Lu Zhang
Chunxin Fang
Jiajia Liao
Kelei Jiang
Yibo Ma
Ruochen Xu
MLLM
VLM
54
5
0
06 Jul 2024
AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
Yuhan Zhu
Yuyang Ji
Zhiyu Zhao
Gangshan Wu
Limin Wang
VLM
44
7
0
05 Jul 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Pan Zhang
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Rui Qian
...
Kai Chen
Jifeng Dai
Yu Qiao
Dahua Lin
Jiaqi Wang
45
100
0
03 Jul 2024
KeyVideoLLM: Towards Large-scale Video Keyframe Selection
Hao Liang
Jiapeng Li
Tianyi Bai
Xijie Huang
Linzhuang Sun
Zhengren Wang
Conghui He
Bin Cui
Chong Chen
Wentao Zhang
VGen
29
7
0
03 Jul 2024
FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs
Haodong Chen
Haojian Huang
Junhao Dong
Mingzhe Zheng
Dian Shao
45
15
0
02 Jul 2024
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Jiawei Wang
Liping Yuan
Yuchen Zhang
47
52
0
30 Jun 2024
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding
Kirolos Ataallah
Chenhui Gou
Eslam Abdelrahman
Khushbu Pahwa
Jian Ding
Mohamed Elhoseiny
VLM
32
8
0
28 Jun 2024
MatchTime: Towards Automatic Soccer Game Commentary Generation
Jiayuan Rao
Haoning Wu
Chang-rui Liu
Yanfeng Wang
Weidi Xie
43
7
0
26 Jun 2024
EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation
Baoqi Pei
Guo Chen
Jilan Xu
Yuping He
Yicheng Liu
...
Yifei Huang
Yali Wang
Tong Lu
Limin Wang
Yu Qiao
EgoV
45
14
0
26 Jun 2024
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models
Yuxuan Wang
Yueqian Wang
Dongyan Zhao
Cihang Xie
Zilong Zheng
MLLM
VLM
52
26
0
24 Jun 2024
Previous
1
2
3
4
5
6
7
Next