Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.06665
Cited By
Data Collection-free Masked Video Modeling
10 September 2024
Yuchi Ishikawa
Masayoshi Kondo
Yoshimitsu Aoki
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Data Collection-free Masked Video Modeling"
41 / 41 papers shown
Title
Learning Human Action Recognition Representations Without Real Humans
Howard Zhong
Samarth Mishra
Donghyun Kim
SouYoung Jin
Yikang Shen
Hildegard Kuehne
Leonid Karlinsky
Venkatesh Saligrama
Aude Oliva
Rogerio Feris
61
3
0
10 Nov 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
93
166
0
28 Mar 2023
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Lu Yuan
Yu-Gang Jiang
VGen
79
91
0
08 Dec 2022
Procedural Image Programs for Representation Learning
Manel Baradad
Chun-Fu
Jonas Wulff
Tongzhou Wang
Rogerio Feris
Antonio Torralba
Phillip Isola
46
23
0
29 Nov 2022
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
139
3,438
0
16 Oct 2022
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
139
668
0
16 Dec 2021
PASS: An ImageNet replacement for self-supervised pretraining without humans
Yuki M. Asano
Christian Rupprecht
Andrew Zisserman
Andrea Vedaldi
VLM
SSL
77
58
0
27 Sep 2021
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
240
2,812
0
15 Jun 2021
Learning to See by Looking at Noise
Manel Baradad
Jonas Wulff
Tongzhou Wang
Phillip Isola
Antonio Torralba
63
92
0
10 Jun 2021
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
Christoph Feichtenhofer
Haoqi Fan
Bo Xiong
Ross B. Girshick
Kaiming He
SSL
AI4TS
90
262
0
29 Apr 2021
An Empirical Study of Training Self-Supervised Vision Transformers
Xinlei Chen
Saining Xie
Kaiming He
ViT
146
1,862
0
05 Apr 2021
Self-supervised Motion Learning from Static Images
Ziyuan Huang
Shiwei Zhang
Jianwen Jiang
Mingqian Tang
Rong Jin
M. Ang
SSL
46
29
0
01 Apr 2021
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
207
2,147
0
29 Mar 2021
VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples
Tian Pan
Yibing Song
Tianyu Yang
Wenhao Jiang
Wei Liu
73
225
0
10 Mar 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
362
2,045
0
09 Feb 2021
Pre-training without Natural Images
Hirokatsu Kataoka
Kazushige Okayasu
Asato Matsumoto
Eisuke Yamagata
Ryosuke Yamada
Nakamasa Inoue
Akio Nakamura
Y. Satoh
98
119
0
21 Jan 2021
VideoMix: Rethinking Data Augmentation for Video Classification
Sangdoo Yun
Seong Joon Oh
Byeongho Heo
Dongyoon Han
Jinhyung Kim
405
74
0
07 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
557
40,961
0
22 Oct 2020
Self-supervised Video Representation Learning by Pace Prediction
Jiangliu Wang
Jianbo Jiao
Yunhui Liu
SSL
AI4TS
71
235
0
13 Aug 2020
RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
Zachary Teed
Jia Deng
MDE
214
2,619
0
26 Mar 2020
Towards Fairer Datasets: Filtering and Balancing the Distribution of the People Subtree in the ImageNet Hierarchy
Kaiyu Yang
Klint Qinami
Li Fei-Fei
Jia Deng
Olga Russakovsky
108
320
0
16 Dec 2019
Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition
Jinwoo Choi
Chen Gao
Joseph C.E. Messou
Jia-Bin Huang
103
182
0
11 Dec 2019
Synthetic Humans for Action Recognition from Unseen Viewpoints
Gül Varol
Ivan Laptev
Cordelia Schmid
Andrew Zisserman
85
98
0
09 Dec 2019
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
406
42,393
0
03 Dec 2019
RandAugment: Practical automated data augmentation with a reduced search space
E. D. Cubuk
Barret Zoph
Jonathon Shlens
Quoc V. Le
MQ
210
3,485
0
30 Sep 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
105
1,199
0
07 Jun 2019
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Sangdoo Yun
Dongyoon Han
Seong Joon Oh
Sanghyuk Chun
Junsuk Choe
Y. Yoo
OOD
604
4,766
0
13 May 2019
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
162
3,272
0
10 Dec 2018
Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles
Dahun Kim
Donghyeon Cho
In So Kweon
SSL
65
348
0
24 Nov 2018
Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition
Unaiza Ahsan
Rishi Madhok
Irfan Essa
SSL
56
109
0
22 Aug 2018
Exploring the Limits of Weakly Supervised Pretraining
D. Mahajan
Ross B. Girshick
Vignesh Ramanathan
Kaiming He
Manohar Paluri
Yixuan Li
Ashwin R. Bharambe
Laurens van der Maaten
VLM
178
1,367
0
02 May 2018
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
Kensho Hara
Hirokatsu Kataoka
Y. Satoh
3DPC
118
1,934
0
27 Nov 2017
Non-local Neural Networks
Xinyu Wang
Ross B. Girshick
Abhinav Gupta
Kaiming He
OffRL
279
8,902
0
21 Nov 2017
mixup: Beyond Empirical Risk Minimization
Hongyi Zhang
Moustapha Cissé
Yann N. Dauphin
David Lopez-Paz
NoLa
271
9,759
0
25 Oct 2017
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
Chen Sun
Abhinav Shrivastava
Saurabh Singh
Abhinav Gupta
VLM
180
2,393
0
10 Jul 2017
Temporal Segment Networks for Action Recognition in Videos
Limin Wang
Yuanjun Xiong
Zhe Wang
Yu Qiao
Dahua Lin
Xiaoou Tang
Luc Van Gool
ViT
110
810
0
08 May 2017
Procedural Generation of Videos to Train Deep Action Recognition Networks
César Roberto de Souza
Adrien Gaidon
Yohann Cabon
A. Peña
60
144
0
02 Dec 2016
YouTube-8M: A Large-Scale Video Classification Benchmark
Sami Abu-El-Haija
Nisarg Kothari
Joonseok Lee
Apostol Natsev
G. Toderici
Balakrishnan Varadarajan
Sudheendra Vijayanarasimhan
VLM
124
1,268
0
27 Sep 2016
Deep Networks with Stochastic Depth
Gao Huang
Yu Sun
Zhuang Liu
Daniel Sedra
Kilian Q. Weinberger
207
2,356
0
30 Mar 2016
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jonathon Shlens
Z. Wojna
3DV
BDL
855
27,350
0
02 Dec 2015
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
K. Soomro
Amir Zamir
M. Shah
CLIP
VGen
139
6,145
0
03 Dec 2012
1