v1v2v3 (latest)

Audiovisual Masked Autoencoders

9 December 2022

Mariana-Iuliana Georgescu

Papers citing "Audiovisual Masked Autoencoders"

27 / 77 papers shown

Title
End-to-End Learning of Visual Representations from Uncurated Instructional Videos Antoine Miech Jean-Baptiste Alayrac Lucas Smaira Ivan Laptev Josef Sivic Andrew Zisserman VGen SSL 126 712 0 13 Dec 2019
Self-Supervised Learning by Cross-Modal Audio-Video Clustering Humam Alwassel D. Mahajan Bruno Korbar Lorenzo Torresani Guohao Li Du Tran SSL 95 431 0 28 Nov 2019
Momentum Contrast for Unsupervised Visual Representation Learning Kaiming He Haoqi Fan Yuxin Wu Saining Xie Ross B. Girshick SSL 204 12,085 0 13 Nov 2019
Self-labelling via simultaneous clustering and representation learning Yuki M. Asano Christian Rupprecht Andrea Vedaldi SSL 123 777 0 13 Nov 2019
What Makes Training Multi-Modal Classification Networks Hard? Weiyao Wang Du Tran Matt Feiszli 113 453 0 29 May 2019
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition Daniel S. Park William Chan Yu Zhang Chung-Cheng Chiu Barret Zoph E. D. Cubuk Quoc V. Le VLM 177 3,461 0 18 Apr 2019
TSM: Temporal Shift Module for Efficient Video Understanding Ji Lin Chuang Gan Song Han 98 1,691 0 20 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 94,891 0 11 Oct 2018
Deep Clustering for Unsupervised Learning of Visual Features Mathilde Caron Piotr Bojanowski Armand Joulin Matthijs Douze SSL 88 1,898 0 15 Jul 2018
Representation Learning with Contrastive Predictive Coding Aaron van den Oord Yazhe Li Oriol Vinyals DRL SSL 320 10,302 0 10 Jul 2018
Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization Bruno Korbar Du Tran Lorenzo Torresani 97 476 0 30 Jun 2018
Exploring the Limits of Weakly Supervised Pretraining D. Mahajan Ross B. Girshick Vignesh Ramanathan Kaiming He Manohar Paluri Yixuan Li Ashwin R. Bharambe Laurens van der Maaten VLM 188 1,368 0 02 May 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features Andrew Owens Alexei A. Efros SSL 98 752 0 10 Apr 2018
Audio-Visual Event Localization in Unconstrained Videos Yapeng Tian Jing Shi Bochen Li Zhiyao Duan Chenliang Xu 101 438 0 23 Mar 2018
Objects that Sound Relja Arandjelović Andrew Zisserman ObjD VOS 101 530 0 18 Dec 2017
mixup: Beyond Empirical Risk Minimization Hongyi Zhang Moustapha Cissé Yann N. Dauphin David Lopez-Paz NoLa 280 9,764 0 25 Oct 2017
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era Chen Sun Abhinav Shrivastava Saurabh Singh Abhinav Gupta VLM 191 2,401 0 10 Jul 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 713 131,652 0 12 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour Priya Goyal Piotr Dollár Ross B. Girshick P. Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia Kaiming He 3DH 128 3,681 0 08 Jun 2017
Look, Listen and Learn Relja Arandjelović Andrew Zisserman SSL 115 905 0 23 May 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset João Carreira Andrew Zisserman 235 8,019 0 22 May 2017
Context Encoders: Feature Learning by Inpainting Deepak Pathak Philipp Krahenbuhl Jeff Donahue Trevor Darrell Alexei A. Efros SSL 67 5,297 0 25 Apr 2016
Deep Networks with Stochastic Depth Gao Huang Yu Sun Zhuang Liu Daniel Sedra Kilian Q. Weinberger 215 2,357 0 30 Mar 2016
Colorful Image Colorization Richard Y. Zhang Phillip Isola Alexei A. Efros 135 3,530 0 28 Mar 2016
Rethinking the Inception Architecture for Computer Vision Christian Szegedy Vincent Vanhoucke Sergey Ioffe Jonathon Shlens Z. Wojna 3DV BDL 883 27,373 0 02 Dec 2015
Unsupervised Visual Representation Learning by Context Prediction Carl Doersch Abhinav Gupta Alexei A. Efros DRL SSL 166 2,786 0 19 May 2015
Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman 247 7,535 0 09 Jun 2014