v1v2 (latest)

Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization

30 June 2018

Papers citing "Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization"

50 / 316 papers shown

Title
Active Contrastive Learning of Audio-Visual Video Representations Shuang Ma Zhaoyang Zeng Daniel J. McDuff Yale Song VLM SSL 62 8 0 31 Aug 2020
Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics Jiangliu Wang Jianbo Jiao Linchao Bao Shengfeng He Wei Liu Yunhui Liu SSL AI4TS 75 55 0 31 Aug 2020
Self-supervised Video Representation Learning by Pace Prediction Jiangliu Wang Jianbo Jiao Yunhui Liu SSL AI4TS 103 237 0 13 Aug 2020
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning Ying Cheng Ruize Wang Zhihao Pan Rui Feng Yuejie Zhang SSL 150 110 0 13 Aug 2020
Spatiotemporal Contrastive Video Representation Learning Rui Qian Tianjian Meng Boqing Gong Ming-Hsuan Yang Haoran Wang Serge J. Belongie Huayu Chen SSL AI4TS 180 502 0 09 Aug 2020
Memory-augmented Dense Predictive Coding for Video Representation Learning Tengda Han Weidi Xie Andrew Zisserman SSL 149 242 0 03 Aug 2020
Learning Video Representations from Textual Web Supervision Jonathan C. Stroud Zhichao Lu Chen Sun Jia Deng Rahul Sukthankar Cordelia Schmid David A. Ross SSL 115 48 0 29 Jul 2020
Noisy Agents: Self-supervised Exploration by Predicting Auditory Events Chuang Gan Xiaoyu Chen Phillip Isola Antonio Torralba J. Tenenbaum 70 7 0 27 Jul 2020
Video Representation Learning by Recognizing Temporal Transformations Simon Jenni Givi Meishvili Paolo Favaro 203 135 0 21 Jul 2020
CSLNSpeech: solving extended speech separation problem with the help of Chinese sign language Jiasong Wu Xuan Li Taotao Li Fanman Meng Youyong Kong Guanyu Yang L. Senhadji Huazhong Shu CVBM 97 0 0 21 Jul 2020
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing Yapeng Tian Dingzeyu Li Chenliang Xu 151 185 0 21 Jul 2020
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation Hang Zhou Xudong Xu Dahua Lin Xiaogang Wang Ziwei Liu DiffM 103 84 0 20 Jul 2020
Leveraging Category Information for Single-Frame Visual Sound Source Separation Lingyu Zhu Esa Rahtu 89 9 0 15 Jul 2020
Do We Need Sound for Sound Source Localization? Takashi Oya Shohei Iwase Ryota Natsume Takahiro Itazuri Shugo Yamaguchi Shigeo Morishima 55 22 0 11 Jul 2020
Self-Supervised MultiModal Versatile Networks Jean-Baptiste Alayrac Adrià Recasens R. Schneider Relja Arandjelović Jason Ramapuram J. Fauw Lucas Smaira Sander Dieleman Andrew Zisserman SSL 219 375 0 29 Jun 2020
Video Representation Learning with Visual Tempo Consistency Ceyuan Yang Yinghao Xu Bo Dai Bolei Zhou 75 92 0 28 Jun 2020
Space-Time Correspondence as a Contrastive Random Walk Allan Jabri Andrew Owens Alexei A. Efros SSL OT 170 304 0 25 Jun 2020
Labelling unlabelled videos from scratch with multi-modal self-supervision Yuki M. Asano Mandela Patrick Christian Rupprecht Andrea Vedaldi SSL 151 152 0 24 Jun 2020
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos Andrew Rouditchenko Angie Boggust David Harwath Brian Chen D. Joshi ... Rogerio Feris Brian Kingsbury M. Picheny Antonio Torralba James R. Glass SSL 107 142 0 16 Jun 2020
Solos: A Dataset for Audio-Visual Music Analysis Juan F. Montesinos Olga Slizovskaia G. Haro 67 11 0 14 Jun 2020
DTG-Net: Differentiated Teachers Guided Self-Supervised Video Action Recognition Ziming Liu Guangyu Gao •. A. K. Qin Jinyang Li ViT 62 1 0 13 Jun 2020
Video Understanding as Machine Translation Bruno Korbar Fabio Petroni Rohit Girdhar Lorenzo Torresani SSL 99 29 0 12 Jun 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound Karren D. Yang Bryan C. Russell Justin Salamon SSL 118 76 0 11 Jun 2020
Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data Haytham M. Fayek Anurag Kumar 103 36 0 29 May 2020
Self-supervised Modal and View Invariant Feature Learning Longlong Jing Yucheng Chen Ling Zhang Mingyi He Yingli Tian 3DPC SSL 63 29 0 28 May 2020
End-to-End Lip Synchronisation Based on Pattern Classification You Jin Kim Hee-Soo Heo Soo-Whan Chung Bong-Jin Lee CVBM 47 0 0 18 May 2020
Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition? Abhinav Shukla Stavros Petridis Maja Pantic SSL 109 28 0 04 May 2020
Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision Soo-Whan Chung Hong-Goo Kang Joon Son Chung SSL 57 39 0 29 Apr 2020
Audio-Visual Instance Discrimination with Cross-Modal Agreement Pedro Morgado Nuno Vasconcelos Ishan Misra SSL 98 276 0 27 Apr 2020
Self-supervised Feature Learning by Cross-modality and Cross-view Correspondences Longlong Jing Yucheng Chen Ling Zhang Mingyi He Yingli Tian 3DPC SSL 78 34 0 13 Apr 2020
Conditioned Source Separation for Music Instrument Performances Olga Slizovskaia G. Haro E. Gómez 74 40 0 08 Apr 2020
Speech2Action: Cross-modal Supervision for Action Recognition Arsha Nagrani Chen Sun David A. Ross Rahul Sukthankar Cordelia Schmid Andrew Zisserman 93 54 0 30 Mar 2020
Watching the World Go By: Representation Learning from Unlabeled Videos Daniel Gordon Kiana Ehsani Dieter Fox Ali Farhadi SSL AI4TS 102 90 0 18 Mar 2020
On Compositions of Transformations in Contrastive Self-Supervised Learning Mandela Patrick Yuki M. Asano Polina Kuznetsova Ruth C. Fong João F. Henriques Geoffrey Zweig Andrea Vedaldi 109 50 0 09 Mar 2020
Cross-modal Learning for Multi-modal Video Categorization Palash Goyal Saurabh Sahu Shalini Ghosh Chul Lee 79 9 0 07 Mar 2020
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning Elad Amrani Rami Ben-Ari Daniel Rotman A. Bronstein 152 126 0 06 Mar 2020
VideoSSL: Semi-Supervised Learning for Video Classification Longlong Jing T. Parag Zhe Wu Yingli Tian Hongcheng Wang 71 52 0 29 Feb 2020
Evolving Losses for Unsupervised Video Representation Learning A. Piergiovanni A. Angelova Michael S. Ryoo SSL 89 140 0 26 Feb 2020
Disentangled Speech Embeddings using Cross-modal Self-supervision Arsha Nagrani Joon Son Chung Samuel Albanie Andrew Zisserman SSL 99 88 0 20 Feb 2020
Multi-Modal Domain Adaptation for Fine-Grained Action Recognition Jonathan Munro Dima Damen EgoV 99 196 0 27 Jan 2020
Curriculum Audiovisual Learning Di Hu Zechuan Wang Haoyi Xiong Dong Wang Feiping Nie Dejing Dou SSL 74 32 0 26 Jan 2020
Audiovisual SlowFast Networks for Video Recognition Fanyi Xiao Yong Jae Lee Kristen Grauman Jitendra Malik Christoph Feichtenhofer 310 209 0 23 Jan 2020
Learning Spatiotemporal Features via Video and Text Pair Discrimination Tianhao Li Limin Wang VGen 81 57 0 16 Jan 2020
Deep Audio-Visual Learning: A Survey Hao Zhu Mandi Luo Rui Wang A. Zheng Ran He 90 162 0 14 Jan 2020
Visually Guided Self Supervised Learning of Speech Representations Abhinav Shukla Konstantinos Vougioukas Pingchuan Ma Stavros Petridis Maja Pantic SSL 87 25 0 13 Jan 2020
STAViS: Spatio-Temporal AudioVisual Saliency Network A. Tsiami Petros Koutras Petros Maragos 108 73 0 09 Jan 2020
End-to-End Learning of Visual Representations from Uncurated Instructional Videos Antoine Miech Jean-Baptiste Alayrac Lucas Smaira Ivan Laptev Josef Sivic Andrew Zisserman VGen SSL 164 713 0 13 Dec 2019
Self-Supervised Learning of Pretext-Invariant Representations Ishan Misra Laurens van der Maaten SSL VLM 149 1,461 0 04 Dec 2019
Self-Supervised Learning by Cross-Modal Audio-Video Clustering Humam Alwassel D. Mahajan Bruno Korbar Lorenzo Torresani Guohao Li Du Tran SSL 219 433 0 28 Nov 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications Arda Senocak Tae-Hyun Oh Junsik Kim Ming-Hsuan Yang In So Kweon SSL 86 55 0 20 Nov 2019