ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.00419
  4. Cited By
Self-Supervised Learning for Videos: A Survey

Self-Supervised Learning for Videos: A Survey

18 June 2022
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
    SSL
ArXivPDFHTML

Papers citing "Self-Supervised Learning for Videos: A Survey"

50 / 103 papers shown
Title
Noise Estimation Using Density Estimation for Self-Supervised Multimodal
  Learning
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
Elad Amrani
Rami Ben-Ari
Daniel Rotman
A. Bronstein
63
122
0
06 Mar 2020
Deep Audio-Visual Learning: A Survey
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
61
159
0
14 Jan 2020
Deep Learning for 3D Point Clouds: A Survey
Deep Learning for 3D Point Clouds: A Survey
Yulan Guo
Hanyun Wang
Qingyong Hu
Hao Liu
Li Liu
Bennamoun
3DPC
75
1,666
0
27 Dec 2019
Self-Supervised Learning of Pretext-Invariant Representations
Self-Supervised Learning of Pretext-Invariant Representations
Ishan Misra
Laurens van der Maaten
SSL
VLM
90
1,451
0
04 Dec 2019
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
85
430
0
28 Nov 2019
Momentum Contrast for Unsupervised Visual Representation Learning
Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross B. Girshick
SSL
169
12,065
0
13 Nov 2019
Video Representation Learning by Dense Predictive Coding
Video Representation Learning by Dense Predictive Coding
Tengda Han
Weidi Xie
Andrew Zisserman
SSL
86
361
0
10 Sep 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
217
3,667
0
06 Aug 2019
Use What You Have: Video Retrieval Using Representations From
  Collaborative Experts
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
71
389
0
31 Jul 2019
Contrastive Multiview Coding
Contrastive Multiview Coding
Yonglong Tian
Dilip Krishnan
Phillip Isola
SSL
153
2,395
0
13 Jun 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million
  Narrated Video Clips
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
105
1,199
0
07 Jun 2019
Strategies for Pre-training Graph Neural Networks
Strategies for Pre-training Graph Neural Networks
Weihua Hu
Bowen Liu
Joseph Gomes
Marinka Zitnik
Percy Liang
Vijay S. Pande
J. Leskovec
SSL
AI4CE
108
1,398
0
29 May 2019
Unsupervised Embedding Learning via Invariant and Spreading Instance
  Feature
Unsupervised Embedding Learning via Invariant and Spreading Instance Feature
Mang Ye
Xu-Yao Zhang
PongChi Yuen
Shih-Fu Chang
SSL
78
581
0
06 Apr 2019
Cross-task weakly supervised learning from instructional videos
Cross-task weakly supervised learning from instructional videos
Dimitri Zhukov
Jean-Baptiste Alayrac
R. G. Cinbis
David Fouhey
Ivan Laptev
Josef Sivic
SSL
113
249
0
19 Mar 2019
Self-supervised Visual Feature Learning with Deep Neural Networks: A
  Survey
Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey
Longlong Jing
Yingli Tian
SSL
108
1,697
0
16 Feb 2019
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
Joseph Roth
Sourish Chaudhuri
Ondˇrej Klejch
Radhika Marvin
Andrew C. Gallagher
...
S. Ramaswamy
Arkadiusz Stopczynski
Cordelia Schmid
Zhonghua Xi
C. Pantofaru
55
144
0
05 Jan 2019
Iterative Reorganization with Weak Spatial Constraints: Solving
  Arbitrary Jigsaw Puzzles for Unsupervised Representation Learning
Iterative Reorganization with Weak Spatial Constraints: Solving Arbitrary Jigsaw Puzzles for Unsupervised Representation Learning
Chen Wei
Lingxi Xie
Xutong Ren
Yingda Xia
Chi Su
Jiaying Liu
Qi Tian
Alan Yuille
SSL
60
131
0
02 Dec 2018
Self-Supervised Spatiotemporal Feature Learning via Video Rotation
  Prediction
Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction
Longlong Jing
Xiaodong Yang
Jingen Liu
Yingli Tian
66
156
0
28 Nov 2018
Self-Supervised Video Representation Learning with Space-Time Cubic
  Puzzles
Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles
Dahun Kim
Donghyeon Cho
In So Kweon
SSL
65
347
0
24 Nov 2018
TSM: Temporal Shift Module for Efficient Video Understanding
TSM: Temporal Shift Module for Efficient Video Understanding
Ji Lin
Chuang Gan
Song Han
85
1,683
0
20 Nov 2018
Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video
  Action Recognition
Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition
Unaiza Ahsan
Rishi Madhok
Irfan Essa
SSL
56
109
0
22 Aug 2018
Recycle-GAN: Unsupervised Video Retargeting
Recycle-GAN: Unsupervised Video Retargeting
Aayush Bansal
Shugao Ma
Deva Ramanan
Yaser Sheikh
VGen
DiffM
73
297
0
15 Aug 2018
Cooperative Learning of Audio and Video Models from Self-Supervised
  Synchronization
Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization
Bruno Korbar
Du Tran
Lorenzo Torresani
95
474
0
30 Jun 2018
Boosting Self-Supervised Learning via Knowledge Transfer
Boosting Self-Supervised Learning via Knowledge Transfer
M. Noroozi
Ananth Vinjimoor
Paolo Favaro
Hamed Pirsiavash
SSL
274
296
0
01 May 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
89
748
0
10 Apr 2018
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data
Antoine Miech
Ivan Laptev
Josef Sivic
61
234
0
07 Apr 2018
Unsupervised Representation Learning by Predicting Image Rotations
Unsupervised Representation Learning by Predicting Image Rotations
Spyros Gidaris
Praveer Singh
N. Komodakis
OOD
SSL
DRL
235
3,283
0
21 Mar 2018
Moments in Time Dataset: one million videos for event understanding
Moments in Time Dataset: one million videos for event understanding
Mathew Monfort
A. Andonian
Bolei Zhou
K. Ramakrishnan
Sarah Adel Bargal
...
L. Brown
Quanfu Fan
Dan Gutfreund
Carl Vondrick
A. Oliva
92
545
0
09 Jan 2018
Objects that Sound
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjD
VOS
92
529
0
18 Dec 2017
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in
  Video Classification
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
3DH
137
1,325
0
13 Dec 2017
Improvements to context based self-supervised learning
Improvements to context based self-supervised learning
T. N. Mundhenk
Daniel E. Ho
Barry Y. Chen
SSL
44
120
0
17 Nov 2017
One Model To Learn Them All
One Model To Learn Them All
Lukasz Kaiser
Aidan Gomez
Noam M. Shazeer
Ashish Vaswani
Niki Parmar
Llion Jones
Jakob Uszkoreit
VLM
ViT
74
333
0
16 Jun 2017
Suggestive Annotation: A Deep Active Learning Framework for Biomedical
  Image Segmentation
Suggestive Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation
Ling Yang
Yizhe Zhang
Jianxu Chen
Siyuan Zhang
Danny Chen
MedIm
65
504
0
15 Jun 2017
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual
  Actions
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
Chunhui Gu
Chen Sun
David A. Ross
Carl Vondrick
C. Pantofaru
...
G. Toderici
Susanna Ricco
Rahul Sukthankar
Cordelia Schmid
Jitendra Malik
VGen
99
1,028
0
23 May 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
219
7,989
0
22 May 2017
Dense-Captioning Events in Videos
Dense-Captioning Events in Videos
Ranjay Krishna
Kenji Hata
F. Ren
Li Fei-Fei
Juan Carlos Niebles
134
1,242
0
02 May 2017
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial
  Networks
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Jun-Yan Zhu
Taesung Park
Phillip Isola
Alexei A. Efros
GAN
111
5,554
0
30 Mar 2017
Lip Reading Sentences in the Wild
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
250
789
0
16 Nov 2016
YouTube-8M: A Large-Scale Video Classification Benchmark
YouTube-8M: A Large-Scale Video Classification Benchmark
Sami Abu-El-Haija
Nisarg Kothari
Joonseok Lee
Apostol Natsev
G. Toderici
Balakrishnan Varadarajan
Sudheendra Vijayanarasimhan
VLM
112
1,265
0
27 Sep 2016
Generating Videos with Scene Dynamics
Generating Videos with Scene Dynamics
Carl Vondrick
Hamed Pirsiavash
Antonio Torralba
GAN
VGen
174
1,468
0
08 Sep 2016
Online Action Detection
Online Action Detection
R. D. Geest
E. Gavves
Amir Ghodrati
Zhenyang Li
Cees G. M. Snoek
Tinne Tuytelaars
OffRL
60
152
0
21 Apr 2016
Hollywood in Homes: Crowdsourcing Data Collection for Activity
  Understanding
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
Gunnar Sigurdsson
Gül Varol
Xinyu Wang
Ali Farhadi
Ivan Laptev
Abhinav Gupta
VGen
92
1,245
0
06 Apr 2016
Unsupervised Learning of Visual Representations by Solving Jigsaw
  Puzzles
Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
M. Noroozi
Paolo Favaro
SSL
157
2
0
30 Mar 2016
Learning Representations for Automatic Colorization
Learning Representations for Automatic Colorization
Gustav Larsson
Michael Maire
Gregory Shakhnarovich
VLM
SSL
81
1,012
0
22 Mar 2016
Deep multi-scale video prediction beyond mean square error
Deep multi-scale video prediction beyond mean square error
Michaël Mathieu
Camille Couprie
Yann LeCun
GAN
122
1,882
0
17 Nov 2015
Spatial Transformer Networks
Spatial Transformer Networks
Max Jaderberg
Karen Simonyan
Andrew Zisserman
Koray Kavukcuoglu
292
7,379
0
05 Jun 2015
Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction
Carl Doersch
Abhinav Gupta
Alexei A. Efros
DRL
SSL
164
2,782
0
19 May 2015
FaceNet: A Unified Embedding for Face Recognition and Clustering
FaceNet: A Unified Embedding for Face Recognition and Clustering
Florian Schroff
Dmitry Kalenichenko
James Philbin
3DH
338
13,123
0
12 Mar 2015
Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs
Nitish Srivastava
Elman Mansimov
Ruslan Salakhutdinov
SSL
130
2,589
0
16 Feb 2015
CIDEr: Consensus-based Image Description Evaluation
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
258
4,471
0
20 Nov 2014
Previous
123
Next