Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1512.02902
Cited By
MovieQA: Understanding Stories in Movies through Question-Answering
9 December 2015
Makarand Tapaswi
Yukun Zhu
Rainer Stiefelhagen
Antonio Torralba
R. Urtasun
Sanja Fidler
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MovieQA: Understanding Stories in Movies through Question-Answering"
50 / 200 papers shown
Title
R^3-VQA: "Read the Room" by Video Social Reasoning
Lixing Niu
Jiapeng Li
Xingping Yu
Shu Wang
Ruining Feng
Bo Wu
Ping Wei
Yansen Wang
Lifeng Fan
51
0
0
07 May 2025
CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition
Quynh Phung
Long Mai
Fabian Caba Heilbron
Feng Liu
Jia-Bin Huang
Cusuh Ham
DiffM
VGen
CoGe
111
0
0
28 Apr 2025
DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering
Hairu Wang
Kai Hu
Liangcai Gao
200
0
0
20 Mar 2025
Do Language Models Understand Time?
Xi Ding
Lei Wang
184
0
0
18 Dec 2024
Neptune: The Long Orbit to Benchmarking Long Video Understanding
Arsha Nagrani
Ruotong Wang
Ramin Mehran
Rachel Hornung
N. B. Gundavarapu
...
Boqing Gong
Cordelia Schmid
Mikhail Sirotenko
Yukun Zhu
Tobias Weyand
108
4
0
12 Dec 2024
Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events
Aditya Chinchure
Sahithya Ravi
R. Ng
Vered Shwartz
Boyang Albert Li
Leonid Sigal
ReLM
LRM
VLM
82
2
0
07 Dec 2024
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Andong Deng
Tongjia Chen
Shoubin Yu
Taojiannan Yang
Lincoln Spencer
Yapeng Tian
Ajmal Mian
Joey Tianyi Zhou
Chen Chen
LRM
68
1
0
15 Nov 2024
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification
Yichen He
Yuan Lin
Jianchao Wu
Hanchong Zhang
Yuchen Zhang
Ruicheng Le
VGen
VLM
201
2
0
11 Nov 2024
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Lawrence Jang
Yinheng Li
Charles Ding
Justin Lin
Paul Pu Liang
Dan Zhao
Rogerio Bonatti
K. Koishida
49
5
0
24 Oct 2024
QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems
Zhixian He
Pengcheng Zhao
Fuwei Zhang
Shujin Lin
46
0
0
14 Sep 2024
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
55
5
0
31 Jul 2024
End-to-End Video Question Answering with Frame Scoring Mechanisms and Adaptive Sampling
Jianxin Liang
Xiaojun Meng
Yueqian Wang
Chang Liu
Qun Liu
Dongyan Zhao
36
5
0
21 Jul 2024
VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs
Rohit K Bharadwaj
Hanan Gani
Muzammal Naseer
Fahad Shahbaz Khan
Salman Khan
70
3
0
14 Jun 2024
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
Jongwoo Park
Kanchana Ranasinghe
Kumara Kahatapitiya
Wonjeong Ryoo
Donghyun Kim
Michael S. Ryoo
65
20
0
13 Jun 2024
LVBench: An Extreme Long Video Understanding Benchmark
Weihan Wang
Zehai He
Wenyi Hong
Yean Cheng
Xiaohan Zhang
...
Shiyu Huang
Bin Xu
Yuxiao Dong
Ming Ding
Jie Tang
ELM
VLM
49
67
0
12 Jun 2024
Encoding and Controlling Global Semantics for Long-form Video Question Answering
Thong Nguyen
Zhiyuan Hu
Xiaobao Wu
Cong-Duy Nguyen
See-Kiong Ng
A. Luu
43
3
0
30 May 2024
STAR: A Benchmark for Situated Reasoning in Real-World Videos
Bo Wu
Shoubin Yu
Zhenfang Chen
Joshua B Tenenbaum
Chuang Gan
46
178
0
15 May 2024
CinePile: A Long Video Question Answering Dataset and Benchmark
Ruchit Rawal
Khalid Saifullah
Ronen Basri
David Jacobs
Gowthami Somepalli
Tom Goldstein
45
40
0
14 May 2024
A Modular Approach for Multimodal Summarization of TV Shows
Louis Mahon
Mirella Lapata
29
10
0
06 Mar 2024
YTCommentQA: Video Question Answerability in Instructional Videos
Saelyne Yang
Sunghyun Park
Yunseok Jang
Moontae Lee
31
3
0
30 Jan 2024
Find the Cliffhanger: Multi-Modal Trailerness in Soap Operas
Carlo Bretti
Pascal Mettes
Hendrik Vincent Koops
Daan Odijk
Nanne van Noord
37
4
0
29 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
69
35
0
16 Jan 2024
Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports
Haopeng Li
Andong Deng
Qiuhong Ke
Jun Liu
Hossein Rahmani
Yulan Guo
Mohammed Bennamoun
Chen Chen
61
17
0
03 Jan 2024
BloomVQA: Assessing Hierarchical Multi-modal Comprehension
Yunye Gong
Robik Shrestha
Jared Claypoole
Michael Cogswell
Arijit Ray
Christopher Kanan
Ajay Divakaran
36
0
0
20 Dec 2023
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan
Zitian Tang
Zhiqiu Yu
Chen Sun
59
1
0
30 Nov 2023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Munan Ning
Bin Zhu
Yujia Xie
Bin Lin
Jiaxi Cui
Lu Yuan
Dongdong Chen
Li-ming Yuan
ELM
MLLM
27
58
0
27 Nov 2023
FloCoDe: Unbiased Dynamic Scene Graph Generation with Temporal Consistency and Correlation Debiasing
Anant Khandelwal
38
2
0
24 Oct 2023
DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding
Xiao-Yu Guo
Yuan-Fang Li
Gholamreza Haffari
33
5
0
24 Oct 2023
CLEVRER-Humans: Describing Physical and Causal Events the Human Way
Jiayuan Mao
Xuelin Yang
Xikun Zhang
Noah D. Goodman
Jiajun Wu
NAI
30
22
0
05 Oct 2023
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Bipin Rajendran
Bashir M. Al-Hashimi
MLLM
VLM
32
2
0
27 Sep 2023
Local-Global Information Interaction Debiasing for Dynamic Scene Graph Generation
Xinyu Lyu
Jingwei Liu
Yuyu Guo
Lianli Gao
29
1
0
10 Aug 2023
Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion
Isha Rawal
Alexander Matyasko
Shantanu Jaiswal
Basura Fernando
Cheston Tan
26
2
0
15 Jun 2023
Combo of Thinking and Observing for Outside-Knowledge VQA
Q. Si
Yuchen Mo
Zheng Lin
Huishan Ji
Weiping Wang
46
13
0
10 May 2023
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
30
2
0
12 Apr 2023
Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Hung-Ting Su
Yulei Niu
Xudong Lin
Winston H. Hsu
Shih-Fu Chang
VGen
ELM
29
6
0
07 Apr 2023
Borrowing Human Senses: Comment-Aware Self-Training for Social Media Multimodal Classification
Chunpu Xu
Jing Li
VLM
26
5
0
27 Mar 2023
Taking A Closer Look at Visual Relation: Unbiased Video Scene Graph Generation with Decoupled Label Learning
Wenqing Wang
Yawei Luo
Zhiqin Chen
Tao Jiang
Lei Chen
Yi Yang
Jun Xiao
35
7
0
23 Mar 2023
MUTANT: A Multi-sentential Code-mixed Hinglish Dataset
Rahul Gupta
Vivek Srivastava
M. Singh
34
1
0
23 Feb 2023
Complex QA and language models hybrid architectures, Survey
Xavier Daull
P. Bellot
Emmanuel Bruno
Vincent Martin
Elisabeth Murisasco
ELM
33
15
0
17 Feb 2023
Building Scalable Video Understanding Benchmarks through Sports
Aniket Agarwal
Alex Zhang
Karthik Narasimhan
Igor Gilitschenski
Vishvak Murahari
Yash Kant
19
1
0
17 Jan 2023
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images
Ryota Tanaka
Kyosuke Nishida
Kosuke Nishida
Taku Hasegawa
Itsumi Saito
Kuniko Saito
25
73
0
12 Jan 2023
HierVL: Learning Hierarchical Video-Language Embeddings
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
VLM
AI4TS
28
53
0
05 Jan 2023
Going Beyond XAI: A Systematic Survey for Explanation-Guided Learning
Yuyang Gao
Siyi Gu
Junji Jiang
S. Hong
Dazhou Yu
Liang Zhao
31
39
0
07 Dec 2022
NarraSum: A Large-Scale Dataset for Abstractive Narrative Summarization
Chao Zhao
Faeze Brahman
Kaiqiang Song
Wenlin Yao
Dian Yu
Snigdha Chaturvedi
HILM
26
7
0
02 Dec 2022
Watching the News: Towards VideoQA Models that can Read
Soumya Jahagirdar
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
32
18
0
10 Nov 2022
Unsupervised Audio-Visual Lecture Segmentation
Darshan Singh
Anchit Gupta
C. V. Jawahar
Makarand Tapaswi
VOS
24
4
0
29 Oct 2022
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering
Deepanway Ghosal
Navonil Majumder
Rada Mihalcea
Soujanya Poria
58
10
0
29 Oct 2022
Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval
Minjoon Jung
Seongho Choi
Joo-Kyung Kim
Jin-Hwa Kim
Byoung-Tak Zhang
38
7
0
23 Oct 2022
Grounded Video Situation Recognition
Zeeshan Khan
C. V. Jawahar
Makarand Tapaswi
37
13
0
19 Oct 2022
Selective Query-guided Debiasing for Video Corpus Moment Retrieval
Sunjae Yoon
Jiajing Hong
Eunseop Yoon
Dahyun Kim
Junyeong Kim
Hee Suk Yoon
Changdong Yoo
46
21
0
17 Oct 2022
1
2
3
4
Next