Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.03101
Cited By
Hierarchical Self-supervised Representation Learning for Movie Understanding
6 April 2022
Fanyi Xiao
Kaustav Kundu
Joseph Tighe
Davide Modolo
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Hierarchical Self-supervised Representation Learning for Movie Understanding"
42 / 42 papers shown
Title
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
Chi-Hsi Kung
Frangil Ramirez
Juhyung Ha
Yi-Ting Chen
David J. Crandall
Yi-Hsuan Tsai
97
1
0
27 Mar 2025
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Dhruv Verma
Debaditya Roy
Basura Fernando
66
1
0
30 Jul 2024
Towards Long-Form Video Understanding
Chaoxia Wu
Philipp Krahenbuhl
VLM
ViT
111
169
0
21 Jun 2021
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
Hao Tan
Jie Lei
Thomas Wolf
Joey Tianyi Zhou
78
66
0
21 Jun 2021
Long-Short Temporal Contrastive Learning of Video Transformers
Jue Wang
Gedas Bertasius
Du Tran
Lorenzo Torresani
VLM
ViT
97
50
0
17 Jun 2021
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
Christoph Feichtenhofer
Haoqi Fan
Bo Xiong
Ross B. Girshick
Kaiming He
SSL
AI4TS
90
262
0
29 Apr 2021
Visual Semantic Role Labeling for Video Understanding
Arka Sadhu
Tanmay Gupta
Mark Yatskar
Ram Nevatia
Aniruddha Kembhavi
VLM
45
70
0
02 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSL
AI4TS
73
128
0
30 Mar 2021
Spatiotemporal Contrastive Video Representation Learning
Rui Qian
Tianjian Meng
Boqing Gong
Ming-Hsuan Yang
Haoran Wang
Serge J. Belongie
Huayu Chen
SSL
AI4TS
104
499
0
09 Aug 2020
MovieNet: A Holistic Dataset for Movie Understanding
Qingqiu Huang
Yu Xiong
Anyi Rao
Jiaze Wang
Dahua Lin
VGen
76
237
0
21 Jul 2020
Bootstrap your own latent: A new approach to self-supervised Learning
Jean-Bastien Grill
Florian Strub
Florent Altché
Corentin Tallec
Pierre Harvey Richemond
...
M. G. Azar
Bilal Piot
Koray Kavukcuoglu
Rémi Munos
Michal Valko
SSL
347
6,792
0
13 Jun 2020
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Max Bain
Arsha Nagrani
A. Brown
Andrew Zisserman
86
101
0
08 May 2020
SpeedNet: Learning the Speediness in Videos
Sagie Benaim
Ariel Ephrat
Oran Lang
Inbar Mosseri
William T. Freeman
Michael Rubinstein
Michal Irani
Tali Dekel
69
260
0
13 Apr 2020
A Simple Framework for Contrastive Learning of Visual Representations
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
339
18,721
0
13 Feb 2020
Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross B. Girshick
SSL
179
12,065
0
13 Nov 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
541
24,422
0
26 Jul 2019
DynamoNet: Dynamic Action and Motion Network
Ali Diba
Vivek Sharma
Luc Van Gool
Rainer Stiefelhagen
78
110
0
25 Apr 2019
VideoBERT: A Joint Model for Video and Language Representation Learning
Chen Sun
Austin Myers
Carl Vondrick
Kevin Patrick Murphy
Cordelia Schmid
VLM
SSL
75
1,246
0
03 Apr 2019
Learning Correspondence from the Cycle-Consistency of Time
Xinyu Wang
Allan Jabri
Alexei A. Efros
SSL
74
490
0
18 Mar 2019
Long-Term Feature Banks for Detailed Video Understanding
Chao-Yuan Wu
Christoph Feichtenhofer
Haoqi Fan
Kaiming He
Philipp Krahenbuhl
Ross B. Girshick
161
480
0
12 Dec 2018
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
162
3,262
0
10 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.7K
94,729
0
11 Oct 2018
Actor-Centric Relation Network
Chen Sun
Abhinav Shrivastava
Carl Vondrick
Kevin Patrick Murphy
Rahul Sukthankar
Cordelia Schmid
88
221
0
28 Jul 2018
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord
Yazhe Li
Oriol Vinyals
DRL
SSL
300
10,282
0
10 Jul 2018
Tracking Emerges by Colorizing Videos
Carl Vondrick
Abhinav Shrivastava
Alireza Fathi
S. Guadarrama
Kevin Patrick Murphy
75
377
0
25 Jun 2018
Object Level Visual Reasoning in Videos
Fabien Baradel
Natalia Neverova
Christian Wolf
J. Mille
Greg Mori
80
164
0
16 Jun 2018
Videos as Space-Time Region Graphs
Xinyu Wang
Abhinav Gupta
83
756
0
05 Jun 2018
MovieGraphs: Towards Understanding Human-Centric Situations from Videos
Paul Vicol
Makarand Tapaswi
Lluis Castrejon
Sanja Fidler
68
140
0
19 Dec 2017
Attend and Interact: Higher-Order Object Interactions for Video Understanding
Chih-Yao Ma
Asim Kadav
I. Melvin
Z. Kira
G. Al-Regib
H. Graf
52
145
0
16 Nov 2017
A Read-Write Memory Network for Movie Story Understanding
Seil Na
Sangho Lee
Jisung Kim
Gunhee Kim
AIMat
73
98
0
27 Sep 2017
Lattice Long Short-Term Memory for Human Action Recognition
Lin Sun
Kui Jia
Kevin Chen
Dit-Yan Yeung
Bertram E. Shi
Silvio Savarese
59
156
0
13 Aug 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
658
131,414
0
12 Jun 2017
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
Chunhui Gu
Chen Sun
David A. Ross
Carl Vondrick
C. Pantofaru
...
G. Toderici
Susanna Ricco
Rahul Sukthankar
Cordelia Schmid
Jitendra Malik
VGen
101
1,030
0
23 May 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
219
8,012
0
22 May 2017
The Kinetics Human Action Video Dataset
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
...
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
231
3,801
0
19 May 2017
Self-Supervised Video Representation Learning With Odd-One-Out Networks
Basura Fernando
Hakan Bilen
E. Gavves
Stephen Gould
SSL
42
450
0
21 Nov 2016
VideoLSTM Convolves, Attends and Flows for Action Recognition
Zhenyang Li
E. Gavves
Mihir Jain
Cees G. M. Snoek
90
465
0
06 Jul 2016
MovieQA: Understanding Stories in Movies through Question-Answering
Makarand Tapaswi
Yukun Zhu
Rainer Stiefelhagen
Antonio Torralba
R. Urtasun
Sanja Fidler
109
746
0
09 Dec 2015
Beyond Short Snippets: Deep Networks for Video Classification
Joe Yue-Hei Ng
Matthew J. Hausknecht
Sudheendra Vijayanarasimhan
Oriol Vinyals
R. Monga
G. Toderici
139
2,336
0
31 Mar 2015
A Dataset for Movie Description
Anna Rohrbach
Marcus Rohrbach
Niket Tandon
Bernt Schiele
VGen
105
500
0
12 Jan 2015
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
268
4,481
0
20 Nov 2014
Long-term Recurrent Convolutional Networks for Visual Recognition and Description
Jeff Donahue
Lisa Anne Hendricks
Marcus Rohrbach
Subhashini Venugopalan
S. Guadarrama
Kate Saenko
Trevor Darrell
VLM
156
6,048
0
17 Nov 2014
1