Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1604.01753
Cited By
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
6 April 2016
Gunnar A. Sigurdsson
Gül Varol
Xueliang Wang
Ali Farhadi
Ivan Laptev
Abhinav Gupta
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding"
50 / 287 papers shown
Title
Spatiotemporal Deformable Scene Graphs for Complex Activity Detection
Salman Khan
Fabio Cuzzolin
3DPC
51
5
0
16 Apr 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
Hung Le
Nancy F. Chen
Guosheng Lin
MLLM
26
19
0
16 Apr 2021
Automatic Generation of Descriptive Titles for Video Clips Using Deep Learning
Soheyla Amirian
Khaled Rasheed
T. Taha
H. Arabnia
VLM
VGen
19
23
0
07 Apr 2021
A Survey on Natural Language Video Localization
Xinfang Liu
Xiushan Nie
Zhifang Tan
Jie Guo
Yilong Yin
28
7
0
01 Apr 2021
Structured Co-reference Graph Attention for Video-grounded Dialogue
Junyeong Kim
Sunjae Yoon
Dahyun Kim
Chang D. Yoo
26
26
0
24 Mar 2021
Decoupled Spatial Temporal Graphs for Generic Visual Grounding
Qi Feng
Yunchao Wei
Mingming Cheng
Yi Yang
27
5
0
18 Mar 2021
Coarse-Fine Networks for Temporal Activity Detection in Videos
Kumara Kahatapitiya
Michael S. Ryoo
AI4TS
53
38
0
01 Mar 2021
Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues
Hung Le
Nancy F. Chen
Guosheng Lin
36
14
0
01 Mar 2021
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
227
2,434
0
04 Jan 2021
Multi-shot Temporal Event Localization: a Benchmark
Xiaolong Liu
Yao Hu
S. Bai
Fei Ding
X. Bai
Philip Torr
46
82
0
17 Dec 2020
A Comprehensive Study of Deep Video Action Recognition
Yi Zhu
Xinyu Li
Chunhui Liu
Mohammadreza Zolfaghari
Yuanjun Xiong
Chongruo Wu
Zhi-Li Zhang
Joseph Tighe
R. Manmatha
Mu Li
VLM
AI4TS
38
185
0
11 Dec 2020
D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations
Sanath Narayan
Hisham Cholakkal
Munawar Hayat
Fahad Shahbaz Khan
Ming-Hsuan Yang
Ling Shao
27
54
0
11 Dec 2020
Multi-Scale 2D Temporal Adjacent Networks for Moment Localization with Natural Language
Songyang Zhang
Houwen Peng
Jianlong Fu
Yijuan Lu
Jiebo Luo
27
51
0
04 Dec 2020
Spatial-Temporal Alignment Network for Action Recognition and Detection
Junwei Liang
Liangliang Cao
Xuehan Xiong
Ting Yu
Alexander G. Hauptmann
3DPC
16
9
0
04 Dec 2020
A Comprehensive Review on Recent Methods and Challenges of Video Description
Ashutosh Kumar Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
3DV
VLM
19
5
0
30 Nov 2020
SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos
A. Deliège
A. Cioppa
Silvio Giancola
M. J. Seikavandi
J. Dueholm
Kamal Nasrollahi
Guohao Li
T. Moeslund
Marc Van Droogenbroeck
18
152
0
26 Nov 2020
Selective Spatio-Temporal Aggregation Based Pose Refinement System: Towards Understanding Human Activities in Real-World Videos
Di Yang
Rui Dai
Yaohui Wang
Rupayan Mallick
Luca Minciullo
Gianpiero Francesca
F. Brémond
33
16
0
10 Nov 2020
Reducing the Annotation Effort for Video Object Segmentation Datasets
P. Voigtlaender
Lishu Luo
C. Yuan
Yong-jia Jiang
Bastian Leibe
VOS
36
19
0
02 Nov 2020
Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization
Yuanhao Zhai
Le Wang
Wei Tang
Qilin Zhang
Junsong Yuan
G. Hua
36
134
0
22 Oct 2020
TMT: A Transformer-based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-aware Dialog
Wubo Li
Dongwei Jiang
Wei Zou
Xiangang Li
23
6
0
21 Oct 2020
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues
Hung Le
Doyen Sahoo
Nancy F. Chen
Guosheng Lin
47
30
0
20 Oct 2020
Pose And Joint-Aware Action Recognition
Anshul B. Shah
Shlok Kumar Mishra
Ankan Bansal
Jun-Cheng Chen
Ramalingam Chellappa
Abhinav Shrivastava
39
33
0
16 Oct 2020
DORi: Discovering Object Relationship for Moment Localization of a Natural-Language Query in Video
Cristian Rodriguez-Opazo
Edison Marrese-Taylor
Basura Fernando
Hongdong Li
Stephen Gould
137
11
0
13 Oct 2020
Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos
Jie Wu
Guanbin Li
Xiaoguang Han
Liang Lin
OffRL
AI4TS
27
56
0
18 Sep 2020
HAA500: Human-Centric Atomic Action Dataset with Curated Videos
Jihoon Chung
Cheng-hsin Wuu
Hsuan-ru Yang
Yu-Wing Tai
Chi-Keung Tang
18
43
0
11 Sep 2020
Uncovering Hidden Challenges in Query-Based Video Moment Retrieval
Mayu Otani
Yuta Nakashima
Esa Rahtu
J. Heikkilä
21
74
0
01 Sep 2020
In-Home Daily-Life Captioning Using Radio Signals
Lijie Fan
Tianhong Li
Yuan. Yuan
Dina Katabi
40
47
0
25 Aug 2020
VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval
Minuk Ma
Sunjae Yoon
Junyeong Kim
Youngjoon Lee
Sunghun Kang
Chang D. Yoo
40
78
0
24 Aug 2020
Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos
Zhu Zhang
Zhijie Lin
Zhou Zhao
Jieming Zhu
Xiuqiang He
22
69
0
19 Aug 2020
Poet: Product-oriented Video Captioner for E-commerce
Shengyu Zhang
Ziqi Tan
Jin Yu
Zhou Zhao
Kun Kuang
Jie Liu
Jingren Zhou
Hongxia Yang
Fei Wu
14
34
0
16 Aug 2020
Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization
Daizong Liu
Xiaoye Qu
Xiao-Yang Liu
Jianfeng Dong
Pan Zhou
Zichuan Xu
33
129
0
04 Aug 2020
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
Shaoxiang Chen
Wenhao Jiang
Wei Liu
Yu-Gang Jiang
25
101
0
28 Jul 2020
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing
Yapeng Tian
Dingzeyu Li
Chenliang Xu
34
180
0
21 Jul 2020
Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions
Noa Garcia
Yuta Nakashima
23
32
0
17 Jul 2020
TinyVIRAT: Low-resolution Video Action Recognition
Ugur Demir
Yogesh S Rawat
M. Shah
33
36
0
14 Jul 2020
AViD Dataset: Anonymized Videos from Diverse Countries
A. Piergiovanni
Michael S. Ryoo
33
35
0
10 Jul 2020
Aligning Videos in Space and Time
Senthil Purushwalkam
Tian-Chun Ye
Saurabh Gupta
Abhinav Gupta
30
23
0
09 Jul 2020
Video-Grounded Dialogues with Pretrained Generation Language Models
Hung Le
Guosheng Lin
34
28
0
27 Jun 2020
Comprehensive Information Integration Modeling Framework for Video Titling
Shengyu Zhang
Ziqi Tan
Jin Yu
Zhou Zhao
Kun Kuang
Tan Jiang
Jingren Zhou
Hongxia Yang
Fei Wu
31
40
0
24 Jun 2020
Rescaling Egocentric Vision
Dima Damen
Hazel Doughty
G. Farinella
Antonino Furnari
Evangelos Kazakos
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
19
437
0
23 Jun 2020
Understanding Human Hands in Contact at Internet Scale
Dandan Shan
Jiaqi Geng
Michelle Shu
David Fouhey
42
319
0
11 Jun 2020
IMUTube: Automatic Extraction of Virtual on-body Accelerometry from Video for Human Activity Recognition
Hyeokhyen Kwon
C. Tong
H. Haresamudram
Yan Gao
G. Abowd
Nicholas D. Lane
Thomas Ploetz
10
83
0
29 May 2020
Visual Relationship Detection using Scene Graphs: A Survey
Aniket Agarwal
Ayush Mangal
Vipul
GNN
25
20
0
16 May 2020
Cross-media Structured Common Space for Multimedia Event Extraction
Manling Li
Alireza Zareian
Qi Zeng
Spencer Whitehead
Di Lu
Heng Ji
Shih-Fu Chang
10
103
0
05 May 2020
The AVA-Kinetics Localized Human Actions Video Dataset
Ang Li
Meghana Thotakuri
David A. Ross
João Carreira
Alexander Vostrikov
Andrew Zisserman
VGen
19
133
0
01 May 2020
Span-based Localizing Network for Natural Language Video Localization
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
32
312
0
29 Apr 2020
Action recognition in real-world videos
Waqas Sultani
Qazi Ammar Arshad
Chen Chen
26
2
0
22 Apr 2020
X3D: Expanding Architectures for Efficient Video Recognition
Christoph Feichtenhofer
73
1,001
0
09 Apr 2020
Dense Regression Network for Video Grounding
Runhao Zeng
Haoming Xu
Wenbing Huang
Peihao Chen
Mingkui Tan
Chuang Gan
22
283
0
07 Apr 2020
Speech2Action: Cross-modal Supervision for Action Recognition
Arsha Nagrani
Chen Sun
David A. Ross
Rahul Sukthankar
Cordelia Schmid
Andrew Zisserman
33
54
0
30 Mar 2020
Previous
1
2
3
4
5
6
Next