v1v2 (latest)

Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization

30 June 2018

Papers citing "Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization"

50 / 316 papers shown

Title
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing Xian Liu Rui Qian Hang Zhou Di Hu Weiyao Lin Ziwei Liu Bolei Zhou Xiaowei Zhou 80 26 0 13 Feb 2022
Audio-Visual Fusion Layers for Event Type Aware Video Recognition Arda Senocak Junsik Kim Tae-Hyun Oh H. Ryu Dingzeyu Li In So Kweon 87 1 0 12 Feb 2022
Real-time Emergency Vehicle Event Detection Using Audio Data Zubayer Islam Mohamed Abdel-Aty 28 6 0 03 Feb 2022
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection A. Haliassos Rodrigo Mira Stavros Petridis Maja Pantic CVBM 121 133 0 18 Jan 2022
SS-3DCapsNet: Self-supervised 3D Capsule Networks for Medical Segmentation on Less Labeled Data Minh-Khoi Tran Loi Ly Binh-Son Hua Ngan Le 3DPC MedIm 81 17 0 15 Jan 2022
Robust Contrastive Learning against Noisy Views Ching-Yao Chuang R. Devon Hjelm Xin Eric Wang Vibhav Vineet Neel Joshi Antonio Torralba Stefanie Jegelka Ya-heng Song NoLa 66 72 0 12 Jan 2022
Progressive Video Summarization via Multimodal Self-supervised Learning Haopeng Li Qiuhong Ke Mingming Gong Tom Drummond AI4TS 80 19 0 07 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction Bowen Shi Wei-Ning Hsu Kushal Lakhotia Abdel-rahman Mohamed SSL 143 321 0 05 Jan 2022
Sound and Visual Representation Learning with Multiple Pretraining Tasks A. Vasudevan Dengxin Dai Luc Van Gool SSL 90 6 0 04 Jan 2022
Decompose the Sounds and Pixels, Recompose the Events Varshanth R. Rao Md Ibrahim Khalil Haoda Li Peng Dai Juwei Lu 58 5 0 21 Dec 2021
Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer Yanpeng Zhao Jack Hessel Youngjae Yu Ximing Lu Rowan Zellers Yejin Choi 129 27 0 16 Dec 2021
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading Leyuan Qu C. Weber S. Wermter 79 23 0 09 Dec 2021
Exploring Temporal Granularity in Self-Supervised Video Representation Learning Rui Qian Yeqing Li Liangzhe Yuan Boqing Gong Ting Liu Matthew A. Brown Serge Belongie Ming-Hsuan Yang Hartwig Adam Huayu Chen AI4TS 106 6 0 08 Dec 2021
Audio-Visual Synchronisation in the wild Honglie Chen Weidi Xie Triantafyllos Afouras Arsha Nagrani Andrea Vedaldi Andrew Zisserman 127 40 0 08 Dec 2021
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning Manlin Zhang Jinpeng Wang A. J. Ma 85 9 0 07 Dec 2021
Time-Equivariant Contrastive Video Representation Learning Simon Jenni Hailin Jin SSL AI4TS 212 61 0 07 Dec 2021
Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup Siyuan Li Zicheng Liu Zedong Wang Di Wu Zihan Liu Stan Z. Li 111 27 0 30 Nov 2021
AVA-AVD: Audio-Visual Speaker Diarization in the Wild Eric Z. Xu Zeyang Song Satoshi Tsutsui C. Feng Mang Ye Mike Zheng Shou VGen 85 43 0 29 Nov 2021
NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes Suhani Vora Noha Radwan Klaus Greff H. Meyer Kyle Genova Mehdi S. M. Sajjadi Etienne Pot Andrea Tagliasacchi Daniel Duckworth 166 127 0 25 Nov 2021
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing Jiashuo Yu Ying Cheng Ruiwei Zhao Rui Feng Yuejie Zhang 110 62 0 24 Nov 2021
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal Attention Kranti K. Parida Siddharth Srivastava Gaurav Sharma MDE 82 21 0 15 Nov 2021
Structure from Silence: Learning Scene Structure from Ambient Sound Ziyang Chen Xixi Hu Andrew Owens 121 26 0 10 Nov 2021
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity Pritam Sarkar Ali Etemad SSL 100 11 0 09 Nov 2021
Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing Aadarsh Sahoo Rutav Shah Yikang Shen Kate Saenko Abir Das 84 65 0 28 Oct 2021
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation Tanzila Rahman Mengyu Yang Leonid Sigal ViT 80 8 0 26 Oct 2021
Learning 3D Semantic Segmentation with only 2D Image Supervision Kyle Genova Xiaoqi Yin Abhijit Kundu C. Pantofaru Forrester Cole Avneesh Sud B. Brewington B. Shucker Thomas Funkhouser 3DPC 69 81 0 21 Oct 2021
Constrained Mean Shift for Representation Learning Ajinkya Tejankar Soroush Abbasi Koohpayegani Hamed Pirsiavash SSL 58 0 0 19 Oct 2021
Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition M. Planamente Chiara Plizzari Emanuele Alberti Barbara Caputo EgoV 126 35 0 19 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning Haider Al-Tahan Y. Mohsenzadeh SSL AI4TS 68 0 0 13 Oct 2021
Modelling Neighbor Relation in Joint Space-Time Graph for Video Correspondence Learning Zixu Zhao Yueming Jin Pheng-Ann Heng SSL 90 21 0 28 Sep 2021
V-SlowFast Network for Efficient Visual Sound Separation Lingyu Zhu Esa Rahtu 116 10 0 18 Sep 2021
Learning Cross-modal Contrastive Features for Video Domain Adaptation Donghyun Kim Yi-Hsuan Tsai Bingbing Zhuang Xiang Yu Stan Sclaroff Kate Saenko Manmohan Chandraker 92 73 0 26 Aug 2021
Temporal Knowledge Consistency for Unsupervised Visual Representation Learning Wei Feng Yuanjiang Wang Lihua Ma Ye Yuan Fangqiu Yi SSL 55 13 0 24 Aug 2021
Self-Supervised Video Representation Learning with Meta-Contrastive Network Yuanze Lin Xun Guo Yan Lu SSL 78 41 0 19 Aug 2021
How Self-Supervised Learning Can be Used for Fine-Grained Head Pose Estimation? Mahdi Pourmirzaei Farzaneh Esmaili G. Montazer Sasan Karamizadeh Seyedehsamaneh Shojaeilangari 62 0 0 10 Aug 2021
Learning to Cut by Watching Movies Alejandro Pardo Fabian Caba Heilbron Juan Carlos León Alcázar Ali K. Thabet Guohao Li VGen 127 20 0 09 Aug 2021
Video Contrastive Learning with Global Context Haofei Kuang Yi Zhu Zhi-Li Zhang Xinyu Li Joseph Tighe Sören Schwertfeger C. Stachniss Mu Li SSL AI4TS 93 61 0 05 Aug 2021
Federated Self-Training for Semi-Supervised Audio Recognition Vasileios Tsouvalas Aaqib Saeed T. Ozcelebi FedML 88 16 0 14 Jul 2021
Self-Supervised Multi-Modal Alignment for Whole Body Medical Imaging Rhydian Windsor A. Jamaludin T. Kadir Andrew Zisserman 90 16 0 14 Jul 2021
Towards Long-Form Video Understanding Chaoxia Wu Philipp Krahenbuhl VLM ViT 125 170 0 21 Jun 2021
Improving Multi-Modal Learning with Uni-Modal Teachers Chenzhuang Du Tingle Li Yichen Liu Zixin Wen Tianyu Hua Yue Wang Hang Zhao 59 47 0 21 Jun 2021
Improving On-Screen Sound Separation for Open-Domain Videos with Audio-Visual Self-Attention Efthymios Tzinis Scott Wisdom Tal Remez J. Hershey VLM 92 8 0 17 Jun 2021
LiRA: Learning Visual Speech Representations from Audio through Self-supervision Pingchuan Ma Rodrigo Mira Stavros Petridis Björn W. Schuller Maja Pantic SSL 63 54 0 16 Jun 2021
Watching Too Much Television is Good: Self-Supervised Audio-Visual Representation Learning from Movies and TV Shows Mahdi M. Kalayeh Nagendra Kamath Lingyi Liu Ashok Chandrashekar SSL 55 2 0 16 Jun 2021
Learning Audio-Visual Dereverberation Changan Chen Wei-Ju Sun David Harwath Kristen Grauman 95 32 0 14 Jun 2021
Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning Shaobo Min Qi Dai Hongtao Xie Chuang Gan Yongdong Zhang Jingdong Wang SSL 72 7 0 13 Jun 2021
Anticipative Video Transformer Rohit Girdhar Kristen Grauman ViT 96 212 0 03 Jun 2021
Cross-Domain First Person Audio-Visual Action Recognition through Relative Norm Alignment M. Planamente Chiara Plizzari Emanuele Alberti Barbara Caputo EgoV 127 12 0 03 Jun 2021
Automatic audiovisual synchronisation for ultrasound tongue imaging Aciel Eshky J. Cleland M. Ribeiro Eleanor Sugden Korin Richmond Steve Renals 30 7 0 31 May 2021
Home Action Genome: Cooperative Compositional Action Understanding Nishant Rai Haofeng Chen Jingwei Ji Rishi Desai Kazuki Kozuka Shun Ishizaka Ehsan Adeli Juan Carlos Niebles 45 78 0 11 May 2021