Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2208.06662
Cited By
Self-Contained Entity Discovery from Captioned Videos
13 August 2022
M. Ayoughi
P. Mettes
Paul T. Groth
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Self-Contained Entity Discovery from Captioned Videos"
30 / 30 papers shown
Title
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
Priya Goyal
Quentin Duval
Isaac Seessel
Mathilde Caron
Ishan Misra
Levent Sagun
Armand Joulin
Piotr Bojanowski
VLM
SSL
93
111
0
16 Feb 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
403
1,114
0
13 Oct 2021
Towards Long-Form Video Understanding
Chaoxia Wu
Philipp Krahenbuhl
VLM
ViT
117
170
0
21 Jun 2021
MovieNet: A Holistic Dataset for Movie Understanding
Qingqiu Huang
Yu Xiong
Anyi Rao
Jiaze Wang
Dahua Lin
VGen
95
244
0
21 Jul 2020
Knowledge Graph Extraction from Videos
Louis Mahon
Eleonora Giunchiglia
Bowen Li
Thomas Lukasiewicz
48
20
0
20 Jul 2020
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Max Bain
Arsha Nagrani
A. Brown
Andrew Zisserman
108
102
0
08 May 2020
Clustering based Contrastive Learning for Improving Face Representations
Vivek Sharma
Makarand Tapaswi
M. Sarfraz
Rainer Stiefelhagen
CVBM
SSL
71
46
0
05 Apr 2020
Learning Interactions and Relationships between Movie Characters
Anna Kukleva
Makarand Tapaswi
Ivan Laptev
77
51
0
29 Mar 2020
End-to-End Learning of Visual Representations from Uncurated Instructional Videos
Antoine Miech
Jean-Baptiste Alayrac
Lucas Smaira
Ivan Laptev
Josef Sivic
Andrew Zisserman
VGen
SSL
135
713
0
13 Dec 2019
Video Face Clustering with Unknown Number of Clusters
Makarand Tapaswi
M. Law
Sanja Fidler
CVBM
85
60
0
09 Aug 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
122
1,208
0
07 Jun 2019
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
73
229
0
25 Apr 2019
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
169
3,286
0
10 Dec 2018
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRM
BDL
OCL
ReLM
183
883
0
27 Nov 2018
LinkNet: Relational Embedding for Scene Graph
Sanghyun Woo
Dahun Kim
Donghyeon Cho
In So Kweon
GNN
64
147
0
15 Nov 2018
A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks
Victor Sanh
Thomas Wolf
Sebastian Ruder
70
234
0
14 Nov 2018
CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images
Sheng Guo
Weilin Huang
Haozhi Zhang
Chenfan Zhuang
Dengke Dong
Matthew R. Scott
Dinglong Huang
SSL
81
345
0
03 Aug 2018
MovieGraphs: Towards Understanding Human-Centric Situations from Videos
Paul Vicol
Makarand Tapaswi
Lluis Castrejon
Sanja Fidler
71
142
0
19 Dec 2017
MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels
Lu Jiang
Zhengyuan Zhou
Thomas Leung
Li Li
Li Fei-Fei
NoLa
128
1,456
0
14 Dec 2017
Neural Motifs: Scene Graph Parsing with Global Context
Rowan Zellers
Mark Yatskar
Sam Thomson
Yejin Choi
GNN
95
999
0
17 Nov 2017
VGGFace2: A dataset for recognising faces across pose and age
Qiong Cao
Li Shen
Weidi Xie
Omkar M. Parkhi
Andrew Zisserman
CVBM
100
2,635
0
23 Oct 2017
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
Chunhui Gu
Chen Sun
David A. Ross
Carl Vondrick
C. Pantofaru
...
G. Toderici
Susanna Ricco
Rahul Sukthankar
Cordelia Schmid
Jitendra Malik
VGen
123
1,031
0
23 May 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Y. Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
87
561
0
14 Apr 2017
Learning From Noisy Large-Scale Datasets With Minimal Supervision
Andreas Veit
N. Alldrin
Gal Chechik
Ivan Krasin
Abhinav Gupta
Serge J. Belongie
142
480
0
06 Jan 2017
Attend in groups: a weakly-supervised deep learning framework for learning from web data
Bohan Zhuang
Lingqiao Liu
Yao Li
Chunhua Shen
Ian Reid
NoLa
62
89
0
30 Nov 2016
Spatiotemporal Residual Networks for Video Action Recognition
Christoph Feichtenhofer
A. Pinz
Richard P. Wildes
123
719
0
07 Nov 2016
MovieQA: Understanding Stories in Movies through Question-Answering
Makarand Tapaswi
Yukun Zhu
Rainer Stiefelhagen
Antonio Torralba
R. Urtasun
Sanja Fidler
120
752
0
09 Dec 2015
The MegaFace Benchmark: 1 Million Faces for Recognition at Scale
Ira Kemelmacher-Shlizerman
S. M. Seitz
Daniel Miller
Evan Brossard
CVBM
85
863
0
02 Dec 2015
Unsupervised Learning from Narrated Instruction Videos
Jean-Baptiste Alayrac
Piotr Bojanowski
Nishant Agrawal
Josef Sivic
Ivan Laptev
Simon Lacoste-Julien
SSL
86
289
0
30 Jun 2015
A Dataset for Movie Description
Anna Rohrbach
Marcus Rohrbach
Niket Tandon
Bernt Schiele
VGen
124
502
0
12 Jan 2015
1