v1v2 (latest)

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning

8 December 2022

Zuxuan Wu

Lu Yuan

ArXiv (abs)PDF HTML Github (126★)

Papers citing "Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning"

31 / 81 papers shown

Title
Asynchronous Interaction Aggregation for Action Detection Jiajun Tang Jinchao Xia Xinzhi Mu Bo Pang Cewu Lu 68 120 0 16 Apr 2020
SpeedNet: Learning the Speediness in Videos Sagie Benaim Ariel Ephrat Oran Lang Inbar Mosseri William T. Freeman Michael Rubinstein Michal Irani Tali Dekel 72 261 0 13 Apr 2020
X3D: Expanding Architectures for Efficient Video Recognition Christoph Feichtenhofer 143 1,024 0 09 Apr 2020
A Simple Framework for Contrastive Learning of Visual Representations Ting-Li Chen Simon Kornblith Mohammad Norouzi Geoffrey E. Hinton SSL 375 18,866 0 13 Feb 2020
Momentum Contrast for Unsupervised Visual Representation Learning Kaiming He Haoqi Fan Yuxin Wu Saining Xie Ross B. Girshick SSL 210 12,124 0 13 Nov 2019
RandAugment: Practical automated data augmentation with a reduced search space E. D. Cubuk Barret Zoph Jonathon Shlens Quoc V. Le MQ 250 3,502 0 30 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy M. Lewis Luke Zettlemoyer Veselin Stoyanov AIMat 677 24,541 0 26 Jul 2019
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features Sangdoo Yun Dongyoon Han Seong Joon Oh Sanghyuk Chun Junsuk Choe Y. Yoo OOD 622 4,802 0 13 May 2019
Video Classification with Channel-Separated Convolutional Networks Du Tran Heng Wang Lorenzo Torresani Matt Feiszli 3DV 75 589 0 04 Apr 2019
SlowFast Networks for Video Recognition Christoph Feichtenhofer Haoqi Fan Jitendra Malik Kaiming He 169 3,282 0 10 Dec 2018
TSM: Temporal Shift Module for Efficient Video Understanding Ji Lin Chuang Gan Song Han 98 1,692 0 20 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 95,175 0 11 Oct 2018
A Closer Look at Spatiotemporal Convolutions for Action Recognition Du Tran Heng Wang Lorenzo Torresani Jamie Ray Yann LeCun Manohar Paluri 235 3,033 0 30 Nov 2017
Non-local Neural Networks Xinyu Wang Ross B. Girshick Abhinav Gupta Kaiming He OffRL 295 8,917 0 21 Nov 2017
Decoupled Weight Decay Regularization I. Loshchilov Frank Hutter OffRL 149 2,151 0 14 Nov 2017
mixup: Beyond Empirical Risk Minimization Hongyi Zhang Moustapha Cissé Yann N. Dauphin David Lopez-Paz NoLa 289 9,803 0 25 Oct 2017
The "something something" video database for learning and evaluating visual common sense Raghav Goyal Samira Ebrahimi Kahou Vincent Michalski Joanna Materzynska S. Westphal ... Moritz Mueller-Freitag F. Hoppe Christian Thurau Ingo Bax Roland Memisevic VLM 98 1,542 0 13 Jun 2017
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions Chunhui Gu Chen Sun David A. Ross Carl Vondrick C. Pantofaru ... G. Toderici Susanna Ricco Rahul Sukthankar Cordelia Schmid Jitendra Malik VGen 121 1,031 0 23 May 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset João Carreira Andrew Zisserman 235 8,038 0 22 May 2017
Temporal Segment Networks for Action Recognition in Videos Limin Wang Yuanjun Xiong Zhe Wang Yu Qiao Dahua Lin Xiaoou Tang Luc Van Gool ViT 114 812 0 08 May 2017
SGDR: Stochastic Gradient Descent with Warm Restarts I. Loshchilov Frank Hutter ODL 350 8,174 0 13 Aug 2016
Deep Networks with Stochastic Depth Gao Huang Yu Sun Zhuang Liu Daniel Sedra Kilian Q. Weinberger 215 2,361 0 30 Mar 2016
Rethinking the Inception Architecture for Computer Vision Christian Szegedy Vincent Vanhoucke Sergey Ioffe Jonathon Shlens Z. Wojna 3DV BDL 886 27,416 0 02 Dec 2015
Unsupervised Learning of Visual Representations using Videos Xinyu Wang Abhinav Gupta SSL 92 232 0 04 May 2015
Beyond Short Snippets: Deep Networks for Video Classification Joe Yue-Hei Ng Matthew J. Hausknecht Sudheendra Vijayanarasimhan Oriol Vinyals R. Monga G. Toderici 145 2,338 0 31 Mar 2015
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 364 19,733 0 09 Mar 2015
Learning Spatiotemporal Features with 3D Convolutional Networks Du Tran Lubomir D. Bourdev Rob Fergus Lorenzo Torresani Manohar Paluri 3DPC 77 411 0 02 Dec 2014
Long-term Recurrent Convolutional Networks for Visual Recognition and Description Jeff Donahue Lisa Anne Hendricks Marcus Rohrbach Subhashini Venugopalan S. Guadarrama Kate Saenko Trevor Darrell VLM 165 6,056 0 17 Nov 2014
Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman 256 7,542 0 09 Jun 2014
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild K. Soomro Amir Zamir M. Shah CLIP VGen 160 6,164 0 03 Dec 2012
Improving neural networks by preventing co-adaptation of feature detectors Geoffrey E. Hinton Nitish Srivastava A. Krizhevsky Ilya Sutskever Ruslan Salakhutdinov VLM 460 7,667 0 03 Jul 2012