MAAS: Multi-modal Assignation for Active Speaker Detection

MAAS: Multi-modal Assignation for Active Speaker Detection

11 January 2021

Juan Carlos León Alcázar

Fabian Caba Heilbron

Papers citing "MAAS: Multi-modal Assignation for Active Speaker Detection"

18 / 18 papers shown

Title
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization Detao Bai Zhiheng Ma Xihan Wei Liefeng Bo 120 0 0 06 May 2025
Robust Active Speaker Detection in Noisy Environments Siva Sai Nagender Vasireddy Chenxu Zhang Xiaohu Guo Yapeng Tian 40 0 0 27 Mar 2024
Target Active Speaker Detection with Audio-visual Cues Yiding Jiang Ruijie Tao Zexu Pan Haizhou Li 28 16 0 22 May 2023
Egocentric Auditory Attention Localization in Conversations Fiona Ryan Hao Jiang Abhinav Shukla James M. Rehg V. Ithapu EgoV 29 16 0 28 Mar 2023
WASD: A Wilder Active Speaker Detection Dataset Tiago Roxo Joana Cabral Costa Pedro R. M. Inácio Hugo Manuel Proença 19 3 0 09 Mar 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection Xizi Wang Feng Cheng Gedas Bertasius David J. Crandall 26 15 0 19 Jan 2023
Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection Rahul Sharma Shrikanth Narayanan 37 8 0 01 Dec 2022
Unsupervised active speaker detection in media content using cross-modal information Rahul Sharma Shrikanth Narayanan 14 3 0 24 Sep 2022
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection Kyle Min Sourya Roy Subarna Tripathi T. Guha Somdeb Majumdar 24 36 0 15 Jul 2022
UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022 Yuanhang Zhang Susan Liang Shuang Yang Shiguang Shan 8 4 0 22 Jun 2022
End-to-End Active Speaker Detection Juan Carlos León Alcázar M. Cordes Chen Zhao Guohao Li 24 27 0 27 Mar 2022
Learning Spatial-Temporal Graphs for Active Speaker Detection Sourya Roy Kyle Min Subarna Tripathi T. Guha Somdeb Majumdar 35 3 0 02 Dec 2021
Sub-word Level Lip Reading With Visual Attention Prajwal K R Triantafyllos Afouras Andrew Zisserman 12 92 0 14 Oct 2021
How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild Okan Kopuklu Maja Taseska Gerhard Rigoll 3DV 19 45 0 07 Jun 2021
VoxCeleb2: Deep Speaker Recognition Joon Son Chung Arsha Nagrani Andrew Zisserman 227 2,233 0 14 Jun 2018
Image Generation from Scene Graphs Justin Johnson Agrim Gupta Li Fei-Fei GNN 223 815 0 04 Apr 2018
Simple Online and Realtime Tracking with a Deep Association Metric N. Wojke Alex Bewley Dietrich Paulus VOT 228 3,465 0 21 Mar 2017
Lip Reading Sentences in the Wild Joon Son Chung A. Senior Oriol Vinyals Andrew Zisserman 164 784 0 16 Nov 2016