Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

10 April 2018

Papers citing "Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"

50 / 178 papers shown

Title
Is an Object-Centric Video Representation Beneficial for Transfer? Chuhan Zhang Ankush Gupta Andrew Zisserman ViT 37 27 0 20 Jul 2022
Temporal and cross-modal attention for audio-visual zero-shot learning Otniel-Bogdan Mercea Thomas Hummel A. Sophia Koepke Zeynep Akata 38 25 0 20 Jul 2022
SVGraph: Learning Semantic Graphs from Instructional Videos Madeline Chantry Schiappa Y. S. Rawat 17 4 0 16 Jul 2022
Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection Jiashuo Yu Jin-Yuan Liu Ying Cheng Rui Feng Yuejie Zhang 19 34 0 12 Jul 2022
Audio-Visual Segmentation Jinxing Zhou Jianyuan Wang Jingyang Zhang Weixuan Sun Jing Zhang Stan Birchfield Dan Guo Lingpeng Kong Meng Wang Yiran Zhong VOS 33 110 0 11 Jul 2022
Towards Proper Contrastive Self-supervised Learning Strategies For Music Audio Representation Jeong-Eun Choi Seongwon Jang Hyunsouk Cho Sehee Chung SSL 16 6 0 10 Jul 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration Chuang Gan Yi Gu Siyuan Zhou Jeremy Schwartz S. Alter James Traer Dan Gutfreund J. Tenenbaum Josh H. McDermott Antonio Torralba 47 19 0 07 Jul 2022
Learning Music-Dance Representations through Explicit-Implicit Rhythm Synchronization Jiashuo Yu Junfu Pu Ying Cheng Rui Feng Ying Shan 21 5 0 07 Jul 2022
Rethinking Audio-visual Synchronization for Active Speaker Detection Abudukelimu Wuerkaixi You Zhang Z. Duan Changshui Zhang 18 10 0 21 Jun 2022
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key! Chenglizhao Chen Mengke Song Wenfeng Song Li Guo Muwei Jian 35 25 0 20 Jun 2022
GaLeNet: Multimodal Learning for Disaster Prediction, Management and Relief Rohit Saha Meng Fang Angeline Yasodhara Kyryl Truskovskyi Azin Asgarian D. Homola Raahil Shah Frederik Dieleman Jack Weatheritt Thomas Rogers 23 3 0 18 Jun 2022
Self-Supervised Learning for Videos: A Survey Madeline Chantry Schiappa Y. S. Rawat M. Shah SSL 36 131 0 18 Jun 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos Rohit Girdhar Alaaeldin El-Nouby Mannat Singh Kalyan Vasudev Alwala Armand Joulin Ishan Misra ViT 37 97 0 16 Jun 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning Changan Chen Carl Schissler Sanchit Garg Philip Kobernik Alexander Clegg P. Calamia Dhruv Batra Philip Robinson Kristen Grauman 3DGS 33 79 0 16 Jun 2022
Multimodal Learning with Transformers: A Survey P. Xu Xiatian Zhu David A. Clifton ViT 54 527 0 13 Jun 2022
SoK: The Impact of Unlabelled Data in Cyberthreat Detection Giovanni Apruzzese Pavel Laskov A.T. Tastemirova 25 28 0 18 May 2022
On Negative Sampling for Audio-Visual Contrastive Learning from Movies Mahdi M. Kalayeh Shervin Ardeshir Lingyi Liu Nagendra Kamath Ashok Chandrashekar SSL 29 3 0 29 Apr 2022
Sound Localization by Self-Supervised Time Delay Estimation Ziyang Chen David Fouhey Andrew Owens SSL 24 19 0 26 Apr 2022
Probabilistic Representations for Video Contrastive Learning Jungin Park Jiyoung Lee Ig-Jae Kim Kwanghoon Sohn SSL 29 43 0 08 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound Yan-Bo Lin Jie Lei Joey Tianyi Zhou Gedas Bertasius 41 39 0 06 Apr 2022
VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices V. S. Kadandale Juan F. Montesinos G. Haro 24 23 0 05 Apr 2022
MultiMAE: Multi-modal Multi-task Masked Autoencoders Roman Bachmann David Mizrahi Andrei Atanov Amir Zamir 35 265 0 04 Apr 2022
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis Karren D. Yang Dejan Marković Steven Krenn Vasu Agrawal Alexander Richard VGen 16 32 0 31 Mar 2022
Speaker Extraction with Co-Speech Gestures Cue Zexu Pan Xinyuan Qian Haizhou Li SLR 21 26 0 31 Mar 2022
The Sound of Bounding-Boxes Takashi Oya Shohei Iwase Shigeo Morishima 19 2 0 30 Mar 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining Zaid Khan B. Vijaykumar Xiang Yu S. Schulter Manmohan Chandraker Y. Fu CLIP VLM 25 16 0 27 Mar 2022
Object discovery and representation networks Olivier J. Hénaff Skanda Koppula Evan Shelhamer Daniel Zoran Andrew Jaegle Andrew Zisserman João Carreira Relja Arandjelović 44 87 0 16 Mar 2022
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos Saghir Alfasly Jian Lu C. Xu Yuru Zou 36 18 0 06 Mar 2022
Audio Self-supervised Learning: A Survey Shuo Liu Adria Mallol-Ragolta Emilia Parada-Cabeleiro Kun Qian Xingshuo Jing Alexander Kathan Bin Hu Bjoern W. Schuller SSL 35 106 0 02 Mar 2022
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding Mohamed Afham Isuru Dissanayake Dinithi Dissanayake Amaya Dharmasiri Kanchana Thilakarathna Ranga Rodrigo 3DPC 16 251 0 01 Mar 2022
COMPASS: Contrastive Multimodal Pretraining for Autonomous Systems Shuang Ma Sai H. Vemprala Wenshan Wang Jayesh K. Gupta Yale Song Daniel J. McDuff Ashish Kapoor SSL 37 9 0 20 Feb 2022
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition Zitian Zhang Jie Zhang Jian-Shu Zhang Ming Wu Xin Fang Lirong Dai SSL 38 10 0 15 Feb 2022
Visual Acoustic Matching Changan Chen Ruohan Gao P. Calamia Kristen Grauman 21 55 0 14 Feb 2022
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing Xian Liu Rui Qian Hang Zhou Di Hu Weiyao Lin Ziwei Liu Bolei Zhou Xiaowei Zhou 18 25 0 13 Feb 2022
Active Audio-Visual Separation of Dynamic Sound Sources Sagnik Majumder Kristen Grauman 19 21 0 02 Feb 2022
Self-Supervised Moving Vehicle Detection from Audio-Visual Cues Jannik Zürn Wolfram Burgard SSL 31 8 0 30 Jan 2022
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization Hao Jiang Calvin Murdock V. Ithapu EgoV 27 40 0 06 Jan 2022
Sound and Visual Representation Learning with Multiple Pretraining Tasks A. Vasudevan Dengxin Dai Luc Van Gool SSL 33 6 0 04 Jan 2022
Class-aware Sounding Objects Localization via Audiovisual Correspondence Di Hu Yake Wei Rui Qian Weiyao Lin Ruihua Song Ji-Rong Wen 24 41 0 22 Dec 2021
Denoised Labels for Financial Time-Series Data via Self-Supervised Learning Yanqing Ma Carmine Ventre M. Polukarov NoLa 23 7 0 19 Dec 2021
ViewCLR: Learning Self-supervised Video Representation for Unseen Viewpoints Srijan Das Michael S. Ryoo SSL 37 17 0 07 Dec 2021
PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound Zhijian Yang Xiaoran Fan Volkan Isler H. Park 3DH 16 6 0 01 Dec 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video Rishabh Garg Ruohan Gao Kristen Grauman 15 28 0 21 Nov 2021
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal Attention Kranti K. Parida Siddharth Srivastava Gaurav Sharma MDE 36 20 0 15 Nov 2021
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation Tanzila Rahman Mengyu Yang Leonid Sigal ViT 29 8 0 26 Oct 2021
Self-Supervised Visual Representation Learning Using Lightweight Architectures Prathamesh Sonawane Sparsh Drolia Saqib Nizam Shamsi Bhargav Jain SSL 17 1 0 21 Oct 2021
Self-Supervised Representation Learning: Introduction, Advances and Challenges Linus Ericsson Henry Gouk Chen Change Loy Timothy M. Hospedales SSL OOD AI4TS 34 273 0 18 Oct 2021
HUMAN4D: A Human-Centric Multimodal Dataset for Motions and Immersive Media Anargyros Chatzitofis Leonidas Saroglou Prodromos Boutis Petros Drakoulis N. Zioulis ... C. Charbonnier Pablo César D. Zarpalas Stefanos D. Kollias P. Daras 3DH 29 48 0 14 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning Haider Al-Tahan Y. Mohsenzadeh SSL AI4TS 34 0 0 13 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 232 1,019 0 13 Oct 2021