ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1512.02902
  4. Cited By
MovieQA: Understanding Stories in Movies through Question-Answering

MovieQA: Understanding Stories in Movies through Question-Answering

9 December 2015
Makarand Tapaswi
Yukun Zhu
Rainer Stiefelhagen
Antonio Torralba
R. Urtasun
Sanja Fidler
ArXivPDFHTML

Papers citing "MovieQA: Understanding Stories in Movies through Question-Answering"

50 / 202 papers shown
Title
Attention is All They Need: Exploring the Media Archaeology of the
  Computer Vision Research Paper
Attention is All They Need: Exploring the Media Archaeology of the Computer Vision Research Paper
Sam Goree
G. Appleby
David J. Crandall
Norman Su
31
2
0
22 Sep 2022
WildQA: In-the-Wild Video Question Answering
WildQA: In-the-Wild Video Question Answering
Santiago Castro
Naihao Deng
Pingxuan Huang
Mihai Burzo
Rada Mihalcea
79
7
0
14 Sep 2022
Classical Sequence Match is a Competitive Few-Shot One-Class Learner
Classical Sequence Match is a Competitive Few-Shot One-Class Learner
Mengting Hu
H. Gao
Yinhao Bai
Mingming Liu
8
0
0
14 Sep 2022
Frame-Subtitle Self-Supervision for Multi-Modal Video Question Answering
Frame-Subtitle Self-Supervision for Multi-Modal Video Question Answering
Jiong Wang
Zhou Zhao
Weike Jin
18
0
0
08 Sep 2022
A Feature-space Multimodal Data Augmentation Technique for Text-video
  Retrieval
A Feature-space Multimodal Data Augmentation Technique for Text-video Retrieval
Alex Falcon
G. Serra
Oswald Lanz
VGen
44
25
0
03 Aug 2022
Dilated Context Integrated Network with Cross-Modal Consensus for
  Temporal Emotion Localization in Videos
Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos
Juncheng Billy Li
Junlin Xie
Linchao Zhu
Long Qian
Siliang Tang
...
Haochen Shi
Shengyu Zhang
Longhui Wei
Qi Tian
Yueting Zhuang
36
12
0
03 Aug 2022
Video Question Answering with Iterative Video-Text Co-Tokenization
Video Question Answering with Iterative Video-Text Co-Tokenization
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
36
18
0
01 Aug 2022
Face-to-Face Contrastive Learning for Social Intelligence
  Question-Answering
Face-to-Face Contrastive Learning for Social Intelligence Question-Answering
Alex Wilf
Qianli Ma
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
41
10
0
29 Jul 2022
Meta Spatio-Temporal Debiasing for Video Scene Graph Generation
Meta Spatio-Temporal Debiasing for Video Scene Graph Generation
Li Xu
Haoxuan Qu
Jason Kuen
Jiuxiang Gu
Jun Liu
CML
33
27
0
23 Jul 2022
The Anatomy of Video Editing: A Dataset and Benchmark Suite for
  AI-Assisted Video Editing
The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing
Dawit Mureja Argaw
Fabian Caba Heilbron
Joon-Young Lee
Markus Woodson
In So Kweon
VGen
54
22
0
20 Jul 2022
CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination
CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination
Hyounghun Kim
Abhaysinh Zala
Joey Tianyi Zhou
22
6
0
08 Jul 2022
Interactive Visual Reasoning under Uncertainty
Interactive Visual Reasoning under Uncertainty
Manjie Xu
Guangyuan Jiang
Wei Liang
Song-Chun Zhu
Yixin Zhu
LRM
47
5
0
18 Jun 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language
  Models
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
55
228
0
16 Jun 2022
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Dustin Schwenk
Apoorv Khandelwal
Christopher Clark
Kenneth Marino
Roozbeh Mottaghi
16
507
0
03 Jun 2022
Structured Two-stream Attention Network for Video Question Answering
Structured Two-stream Attention Network for Video Question Answering
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
43
68
0
02 Jun 2022
Multimodal Conversational AI: A Survey of Datasets and Approaches
Multimodal Conversational AI: A Survey of Datasets and Approaches
Anirudh S. Sundar
Larry Heck
45
29
0
13 May 2022
Learning to Answer Visual Questions from Web Videos
Learning to Answer Visual Questions from Web Videos
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
39
33
0
10 May 2022
Episodic Memory Question Answering
Episodic Memory Question Answering
Samyak Datta
Sameer Dharur
Vincent Cartillier
Ruta Desai
Mukul Khanna
Dhruv Batra
Devi Parikh
EgoV
19
31
0
03 May 2022
ComPhy: Compositional Physical Reasoning of Objects and Events from
  Videos
ComPhy: Compositional Physical Reasoning of Objects and Events from Videos
Zhenfang Chen
Kexin Yi
Yunzhu Li
Mingyu Ding
Antonio Torralba
J. Tenenbaum
Chuang Gan
CoGe
OCL
20
52
0
02 May 2022
Measuring Compositional Consistency for Video Question Answering
Measuring Compositional Consistency for Video Question Answering
Mona Gandhi
Mustafa Omer Gul
Eva Prakash
Madeleine Grunde-McLaughlin
Ranjay Krishna
Maneesh Agrawala
CoGe
40
15
0
14 Apr 2022
Hierarchical Self-supervised Representation Learning for Movie
  Understanding
Hierarchical Self-supervised Representation Learning for Movie Understanding
Fanyi Xiao
Kaustav Kundu
Joseph Tighe
Davide Modolo
SSL
44
24
0
06 Apr 2022
Modeling Motion with Multi-Modal Features for Text-Based Video
  Segmentation
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Wangbo Zhao
Kai Wang
Xiangxiang Chu
Fuzhao Xue
Xinchao Wang
Yang You
29
21
0
06 Apr 2022
Long Movie Clip Classification with State-Space Video Models
Long Movie Clip Classification with State-Space Video Models
Md. Mohaiminul Islam
Gedas Bertasius
VLM
51
102
0
04 Apr 2022
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li
Yake Wei
Yapeng Tian
Chenliang Xu
Ji-Rong Wen
Di Hu
39
136
0
26 Mar 2022
Synopses of Movie Narratives: a Video-Language Dataset for Story
  Understanding
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding
Yidan Sun
Qin Chao
Yangfeng Ji
Boyang Albert Li
VGen
40
10
0
11 Mar 2022
AssistQ: Affordance-centric Question-driven Task Completion for
  Egocentric Assistant
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
B. Wong
Joya Chen
You Wu
Stan Weixian Lei
Dongxing Mao
Difei Gao
Mike Zheng Shou
EgoV
35
27
0
08 Mar 2022
Video Question Answering: Datasets, Algorithms and Challenges
Video Question Answering: Datasets, Algorithms and Challenges
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
32
87
0
02 Mar 2022
Measuring CLEVRness: Blackbox testing of Visual Reasoning Models
Measuring CLEVRness: Blackbox testing of Visual Reasoning Models
Spyridon Mouselinos
Henryk Michalewski
Mateusz Malinowski
21
3
0
24 Feb 2022
NEWSKVQA: Knowledge-Aware News Video Question Answering
NEWSKVQA: Knowledge-Aware News Video Question Answering
Pranay Gupta
Manish Gupta
30
7
0
08 Feb 2022
OpenQA: Hybrid QA System Relying on Structured Knowledge Base as well as
  Non-structured Data
OpenQA: Hybrid QA System Relying on Structured Knowledge Base as well as Non-structured Data
Gaochen Wu
Bin Xu
Yuxin Qin
Yang Liu
Lingyu Liu
Ziwei Wang
27
0
0
31 Dec 2021
InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition
InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition
A. Glavan
Estefanía Talavera
21
10
0
23 Dec 2021
3D Question Answering
3D Question Answering
Shuquan Ye
Dongdong Chen
Songfang Han
Jing Liao
ViT
31
47
0
15 Dec 2021
AssistSR: Task-oriented Video Segment Retrieval for Personal AI
  Assistant
AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant
Stan Weixian Lei
Difei Gao
Yuxuan Wang
Dongxing Mao
Zihan Liang
L. Ran
Mike Zheng Shou
27
8
0
30 Nov 2021
V2C: Visual Voice Cloning
V2C: Visual Voice Cloning
Qi Chen
Yuanqing Li
Yuankai Qi
Jiaqiu Zhou
Mingkui Tan
Qi Wu
VGen
33
24
0
25 Nov 2021
Dynamic Visual Reasoning by Learning Differentiable Physics Models from
  Video and Language
Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
Mingyu Ding
Zhenfang Chen
Tao Du
Ping Luo
J. Tenenbaum
Chuang Gan
VGen
PINN
OCL
38
74
0
28 Oct 2021
Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$
  Videos
Pano-AVQA: Grounded Audio-Visual Question Answering on 360∘^\circ∘ Videos
Heeseung Yun
Youngjae Yu
Wonsuk Yang
Kangil Lee
Gunhee Kim
25
79
0
11 Oct 2021
More Than Reading Comprehension: A Survey on Datasets and Metrics of
  Textual Question Answering
More Than Reading Comprehension: A Survey on Datasets and Metrics of Textual Question Answering
Yang Bai
D. Wang
96
10
0
25 Sep 2021
Self-supervised Learning for Semi-supervised Temporal Language Grounding
Self-supervised Learning for Semi-supervised Temporal Language Grounding
Fan Luo
Shaoxiang Chen
Jingjing Chen
Zuxuan Wu
Yu-Gang Jiang
VLM
57
11
0
23 Sep 2021
Survey: Transformer based Video-Language Pre-training
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLM
ViT
72
44
0
21 Sep 2021
M5Product: Self-harmonized Contrastive Learning for E-commercial
  Multi-modal Pretraining
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining
Xiao Dong
Xunlin Zhan
Yangxin Wu
Yunchao Wei
Michael C. Kampffmeyer
Xiaoyong Wei
Minlong Lu
Yaowei Wang
Xiaodan Liang
35
37
0
09 Sep 2021
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
Jianwei Yang
Yonatan Bisk
Jianfeng Gao
27
137
0
23 Aug 2021
Mounting Video Metadata on Transformer-based Language Model for
  Open-ended Video Question Answering
Mounting Video Metadata on Transformer-based Language Model for Open-ended Video Question Answering
Donggeon Lee
Seongho Choi
Youwon Jang
Byoung-Tak Zhang
16
2
0
11 Aug 2021
Adaptive Hierarchical Graph Reasoning with Semantic Coherence for
  Video-and-Language Inference
Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference
Juncheng Li
Siliang Tang
Linchao Zhu
Haochen Shi
Xuanwen Huang
Fei Wu
Yi Yang
Yueting Zhuang
27
28
0
26 Jul 2021
Weakly Supervised Temporal Adjacent Network for Language Grounding
Weakly Supervised Temporal Adjacent Network for Language Grounding
Yuechen Wang
Jiajun Deng
Wen-gang Zhou
Houqiang Li
26
67
0
30 Jun 2021
Towards Long-Form Video Understanding
Towards Long-Form Video Understanding
Chaoxia Wu
Philipp Krahenbuhl
VLM
ViT
54
166
0
21 Jun 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
45
448
0
18 May 2021
gComm: An environment for investigating generalization in Grounded
  Language Acquisition
gComm: An environment for investigating generalization in Grounded Language Acquisition
Rishi Hazra
Sonu Dixit
31
0
0
09 May 2021
Bridge to Answer: Structure-aware Graph Interaction Network for Video
  Question Answering
Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering
Jungin Park
Jiyoung Lee
Kwanghoon Sohn
167
100
0
29 Apr 2021
Grounding Physical Concepts of Objects and Events Through Dynamic Visual
  Reasoning
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
Zhenfang Chen
Jiayuan Mao
Jiajun Wu
Kwan-Yee K. Wong
J. Tenenbaum
Chuang Gan
VGen
36
92
0
30 Mar 2021
SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network
  for Video Reasoning over Traffic Events
SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
Li Xu
He Huang
Jun Liu
ViT
LRM
17
83
0
29 Mar 2021
Previous
12345
Next