Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.10937
Cited By
MovieNet: A Holistic Dataset for Movie Understanding
21 July 2020
Qingqiu Huang
Yu Xiong
Anyi Rao
Jiaze Wang
Dahua Lin
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MovieNet: A Holistic Dataset for Movie Understanding"
47 / 47 papers shown
Title
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
Galann Pennec
Zhengyuan Liu
Nicholas Asher
Philippe Muller
Nancy F. Chen
VGen
31
0
0
10 May 2025
CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition
Quynh Phung
Long Mai
Fabian Caba Heilbron
Feng Liu
Jia-Bin Huang
Cusuh Ham
DiffM
VGen
CoGe
108
0
0
28 Apr 2025
VEU-Bench: Towards Comprehensive Understanding of Video Editing
Bozheng Li
Y. Wu
Yi Lu
Jiashuo Yu
Licheng Tang
Jiawang Cao
Wenqing Zhu
Yuyang Sun
Jay Wu
Wenbo Zhu
39
0
0
24 Apr 2025
Towards Understanding Camera Motions in Any Video
Zhiqiu Lin
Siyuan Cen
Daniel Jiang
Jay Karhade
Hewei Wang
...
Rushikesh Zawar
Xue Bai
Yilun Du
Chuang Gan
Deva Ramanan
VGen
30
0
0
21 Apr 2025
FocusedAD: Character-centric Movie Audio Description
Xiaojun Ye
C. Wang
Yiren Song
Sheng Zhou
Liangcheng Li
Jiajun Bu
VGen
55
0
0
16 Apr 2025
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
Shangzhe Di
Zhelun Yu
Guanghao Zhang
Haoyuan Li
Tao Zhong
Hao Cheng
Bolin Li
Wanggui He
Fangxun Shu
Hao Jiang
73
4
0
01 Mar 2025
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLM
VLM
104
2
0
20 Dec 2024
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Weijia Wu
Mingyu Liu
Zeyu Zhu
Xi Xia
Haoen Feng
Wen Wang
Kevin Qinghong Lin
Chunhua Shen
Mike Zheng Shou
DiffM
VGen
119
1
0
22 Nov 2024
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification
Yichen He
Yuan Lin
Jianchao Wu
Hanchong Zhang
Yuchen Zhang
Ruicheng Le
VGen
VLM
151
2
0
11 Nov 2024
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Zuyan Liu
Yuhao Dong
Ziwei Liu
Winston Hu
Jiwen Lu
Yongming Rao
ObjD
83
54
0
19 Sep 2024
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
43
5
0
31 Jul 2024
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
Yue Zhang
Hehe Fan
Yi Yang
53
3
0
24 May 2024
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
Ahmad A Mahmood
Ashmal Vayani
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
LRM
56
7
0
21 Mar 2024
Contextual AD Narration with Interleaved Multimodal Sequence
Hanlin Wang
Zhan Tong
Kecheng Zheng
Yujun Shen
Limin Wang
VGen
57
4
0
19 Mar 2024
Find the Cliffhanger: Multi-Modal Trailerness in Soap Operas
Carlo Bretti
Pascal Mettes
Hendrik Vincent Koops
Daan Odijk
Nanne van Noord
31
4
0
29 Jan 2024
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Yanwei Li
Chengyao Wang
Jiaya Jia
VLM
MLLM
38
259
0
28 Nov 2023
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
...
Jilan Xu
Guo Chen
Ping Luo
Limin Wang
Yu Qiao
VLM
MLLM
56
399
0
28 Nov 2023
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
32
36
0
10 Oct 2023
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
23
2
0
12 Apr 2023
Building Scalable Video Understanding Benchmarks through Sports
Aniket Agarwal
Alex Zhang
Karthik Narasimhan
Igor Gilitschenski
Vishvak Murahari
Yash Kant
19
1
0
17 Jan 2023
Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation
Jie Jiang
Zhimin Li
Jiangfeng Xiong
Rongwei Quan
Qinglin Lu
Wei Liu
21
2
0
09 Dec 2022
MovieCLIP: Visual Scene Recognition in Movies
Digbalay Bose
Rajat Hebbar
Krishna Somandepalli
Haoyang Zhang
Huayu Chen
K. Cole-McLaughlin
Haoran Wang
Shrikanth Narayanan
CLIP
16
21
0
20 Oct 2022
Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows
Anyi Rao
Xuekun Jiang
Sichen Wang
Yuwei Guo
Zihao Liu
Bo Dai
Long Pang
Xiaoyu Wu
Dahua Lin
Libiao Jin
21
6
0
17 Oct 2022
The One Where They Reconstructed 3D Humans and Environments in TV Shows
Georgios Pavlakos
Ethan Weber
Matthew Tancik
Angjoo Kanazawa
3DH
50
23
0
28 Jul 2022
The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing
Dawit Mureja Argaw
Fabian Caba Heilbron
Joon-Young Lee
Markus Woodson
In So Kweon
VGen
50
22
0
20 Jul 2022
Scene Consistency Representation Learning for Video Scene Segmentation
Haoqian Wu
Keyu Chen
Yanan Luo
Ruizhi Qiao
Bo Ren
Haozhe Liu
Weicheng Xie
Linlin Shen
SSL
40
16
0
11 May 2022
Hierarchical Self-supervised Representation Learning for Movie Understanding
Fanyi Xiao
Kaustav Kundu
Joseph Tighe
Davide Modolo
SSL
44
24
0
06 Apr 2022
Long Movie Clip Classification with State-Space Video Models
Md. Mohaiminul Islam
Gedas Bertasius
VLM
43
102
0
04 Apr 2022
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
Hazel Doughty
Cees G. M. Snoek
25
19
0
23 Mar 2022
Learning-by-Narrating: Narrative Pre-Training for Zero-Shot Dialogue Comprehension
Chao Zhao
Wenlin Yao
Dian Yu
Kaiqiang Song
Dong Yu
Jianshu Chen
15
5
0
19 Mar 2022
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding
Yidan Sun
Qin Chao
Yangfeng Ji
Boyang Albert Li
VGen
33
10
0
11 Mar 2022
Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection
Jing Tan
Yuhong Wang
Gangshan Wu
Limin Wang
43
14
0
01 Mar 2022
Boundary-aware Self-supervised Learning for Video Scene Segmentation
Jonghwan Mun
Minchul Shin
Gunsoo Han
Sangho Lee
S. Ha
Joonseok Lee
Eun-Sol Kim
SSL
46
20
0
14 Jan 2022
Exploring Temporal Granularity in Self-Supervised Video Representation Learning
Rui Qian
Yeqing Li
Liangzhe Yuan
Boqing Gong
Ting Liu
Matthew A. Brown
Serge J. Belongie
Ming-Hsuan Yang
Hartwig Adam
Huayu Chen
AI4TS
59
6
0
08 Dec 2021
Tracking People by Predicting 3D Appearance, Location & Pose
Jathushan Rajasegaran
Georgios Pavlakos
Angjoo Kanazawa
Jitendra Malik
3DH
27
15
0
08 Dec 2021
Head and Body: Unified Detector and Graph Network for Person Search in Media
Xiujun Shu
Yusheng Tao
Ruizhi Qiao
Bo Ke
Wei Wen
Bo Ren
30
2
0
27 Nov 2021
Towards Long-Form Video Understanding
Chaoxia Wu
Philipp Krahenbuhl
VLM
ViT
46
165
0
21 Jun 2021
Large-Scale Spatio-Temporal Person Re-identification: Algorithms and Benchmark
Xiujun Shu
Tianlin Li
Xian Zhang
Shiliang Zhang
Yuanqi Chen
Gezhong Li
Q. Tian
AI4TS
21
72
0
31 May 2021
Shot Contrastive Self-Supervised Learning for Scene Boundary Detection
Shixing Chen
Xiaohan Nie
David D. Fan
Dongqing Zhang
Vimal Bhat
Raffay Hamid
SSL
27
62
0
28 Apr 2021
Learning the Predictability of the Future
Dídac Surís
Ruoshi Liu
Carl Vondrick
24
71
0
01 Jan 2021
Human Mesh Recovery from Multiple Shots
Georgios Pavlakos
Jitendra Malik
Angjoo Kanazawa
3DH
45
57
0
17 Dec 2020
Multi-shot Temporal Event Localization: a Benchmark
Xiaolong Liu
Yao Hu
S. Bai
Fei Ding
X. Bai
Philip H. S. Torr
46
81
0
17 Dec 2020
A Unified Framework for Shot Type Classification Based on Subject Centric Lens
Anyi Rao
Jiaze Wang
Linning Xu
Xuekun Jiang
Qingqiu Huang
Bolei Zhou
Dahua Lin
18
60
0
08 Aug 2020
Online Multi-modal Person Search in Videos
J. Xia
Anyi Rao
Qingqiu Huang
Linning Xu
Jiangtao Wen
Dahua Lin
28
28
0
08 Aug 2020
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Max Bain
Arsha Nagrani
A. Brown
Andrew Zisserman
39
100
0
08 May 2020
From Trailers to Storylines: An Efficient Way to Learn from Movies
Qingqiu Huang
Yuanjun Xiong
Yu Xiong
Yuqi Zhang
Dahua Lin
28
26
0
14 Jun 2018
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Z. Tu
Kaiming He
297
10,220
0
16 Nov 2016
1