Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.02467
Cited By
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering
6 June 2019
Zhou Yu
D. Xu
Jun-chen Yu
Ting Yu
Zhou Zhao
Yueting Zhuang
Dacheng Tao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering"
18 / 118 papers shown
Title
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li
Yake Wei
Yapeng Tian
Chenliang Xu
Ji-Rong Wen
Di Hu
34
136
0
26 Mar 2022
Video Question Answering: Datasets, Algorithms and Challenges
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
32
86
0
02 Mar 2022
Measuring CLEVRness: Blackbox testing of Visual Reasoning Models
Spyridon Mouselinos
Henryk Michalewski
Mateusz Malinowski
21
3
0
24 Feb 2022
NEWSKVQA: Knowledge-Aware News Video Question Answering
Pranay Gupta
Manish Gupta
30
7
0
08 Feb 2022
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo
Arsha Nagrani
Anurag Arnab
Cordelia Schmid
27
164
0
20 Jan 2022
AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant
Stan Weixian Lei
Difei Gao
Yuxuan Wang
Dongxing Mao
Zihan Liang
L. Ran
Mike Zheng Shou
27
8
0
30 Nov 2021
Pano-AVQA: Grounded Audio-Visual Question Answering on 360
∘
^\circ
∘
Videos
Heeseung Yun
Youngjae Yu
Wonsuk Yang
Kangil Lee
Gunhee Kim
25
79
0
11 Oct 2021
Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering
Ahjeong Seo
Gi-Cheon Kang
J. Park
Byoung-Tak Zhang
18
53
0
19 Jun 2021
GEM: A General Evaluation Benchmark for Multimodal Tasks
Lin Su
Nan Duan
Edward Cui
Lei Ji
Chenfei Wu
Huaishao Luo
Yongfei Liu
Ming Zhong
Taroon Bharti
Arun Sacheti
VLM
19
19
0
18 Jun 2021
MERLOT: Multimodal Neural Script Knowledge Models
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLM
LRM
33
372
0
04 Jun 2021
Parallel Attention Network with Sequence Matching for Video Grounding
Hao Zhang
Aixin Sun
Wei Jing
Liangli Zhen
Qiufeng Wang
Rick Siow Mong Goh
18
40
0
18 May 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
45
446
0
18 May 2021
SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
Li Xu
He Huang
Jun Liu
ViT
LRM
17
83
0
29 Mar 2021
End-to-End Video Question-Answer Generation with Generator-Pretester Network
Hung-Ting Su
Chen-Hsi Chang
Po-Wei Shen
Yu-Siang Wang
Ya-Liang Chang
Yu-Cheng Chang
Pu-Jen Cheng
Winston H. Hsu
35
31
0
05 Jan 2021
Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
21
66
0
10 Dec 2020
CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions
Tayfun Ates
Muhammed Samil Atesoglu
Cagatay Yigit
.Ilker Kesen
Mert Kobaş
Erkut Erdem
Aykut Erdem
T. Goksun
Deniz Yuret
27
31
0
08 Dec 2020
Span-based Localizing Network for Natural Language Video Localization
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
32
312
0
29 Apr 2020
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
167
1,465
0
06 Jun 2016
Previous
1
2
3