Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.18938
Cited By
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
27 September 2024
Heqing Zou
Tianze Luo
Guiyang Xie
Victor
Zhang
Fengmao Lv
Guangcong Wang
Juanyang Chen
Zhuochen Wang
Hansheng Zhang
Huaijian Zhang
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding"
6 / 6 papers shown
Title
DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes
S. Linok
Vadim Semenov
Anastasia Trunova
Oleg Bulichev
Dmitry A. Yudin
117
0
0
06 May 2025
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
Weiyu Guo
Ziyang Chen
Shaoguang Wang
Jianxiang He
Yijie Xu
Jinhui Ye
Ying Sun
Hui Xiong
121
4
0
17 Mar 2025
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
Xiao Wang
Qingyi Si
Jianlong Wu
Shiyu Zhu
Zheng Lin
Liqiang Nie
VLM
204
8
0
29 Dec 2024
From Image to Video, what do we need in multimodal LLMs?
Suyuan Huang
Haoxin Zhang
Yan Gao
Honggu Chen
Yan Gao
Yao Hu
Zhan Qin
VLM
113
8
0
18 Apr 2024
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
326
577
0
07 Mar 2024
Valley: Video Assistant with Large Language model Enhanced abilitY
Ruipu Luo
Ziwang Zhao
Min Yang
Junwei Dong
Da Li
Pengcheng Lu
Tao Wang
Linmei Hu
Ming-Hui Qiu
MLLM
150
209
0
12 Jun 2023
1