How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild

7 June 2021

Papers citing "How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild"

27 / 27 papers shown

Title
LASER: Lip Landmark Assisted Speaker Detection for Robustness Le Thien Phuc Nguyen Zhuliang Yu Yong Jae Lee 39 1 0 21 Jan 2025
CLIP-VAD: Exploiting Vision-Language Models for Voice Activity Detection Andrea Appiani Cigdem Beyan CLIP VLM 28 0 0 18 Oct 2024
Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges Victoria Mingote Alfonso Ortega A. Miguel Eduardo Lleida 30 0 0 09 Sep 2024
Robust Active Speaker Detection in Noisy Environments Siva Sai Nagender Vasireddy Chenxu Zhang Xiaohu Guo Yapeng Tian 40 0 0 27 Mar 2024
TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning Chaeyoung Jung Suyeon Lee Kihyun Nam Kyeongha Rho You Jin Kim Youngjoon Jang Joon Son Chung 20 9 0 21 Sep 2023
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos Sagnik Majumder Ziad Al-Halah Kristen Grauman SSL EgoV 36 4 0 10 Jul 2023
Target Active Speaker Detection with Audio-visual Cues Yiding Jiang Ruijie Tao Zexu Pan Haizhou Li 28 16 0 22 May 2023
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation Bolin Lai Fiona Ryan Wenqi Jia Miao Liu James M. Rehg EgoV 32 8 0 06 May 2023
Egocentric Auditory Attention Localization in Conversations Fiona Ryan Hao Jiang Abhinav Shukla James M. Rehg V. Ithapu EgoV 29 16 0 28 Mar 2023
WASD: A Wilder Active Speaker Detection Dataset Tiago Roxo Joana Cabral Costa Pedro R. M. Inácio Hugo Manuel Proença 19 3 0 09 Mar 2023
A Light Weight Model for Active Speaker Detection Junhua Liao Haihan Duan Kanghui Feng Wanbing Zhao Yanbing Yang Liangyin Chen 35 36 0 08 Mar 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection Xizi Wang Feng Cheng Gedas Bertasius David J. Crandall 26 15 0 19 Jan 2023
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection Chao Feng Ziyang Chen Andrew Owens 31 71 0 04 Jan 2023
Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function Qing Wang Hang Chen Yannan Jiang Zhe Wang Yuyang Wang Jun Du Chin-Hui Lee 16 4 0 26 Oct 2022
Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection Xuan-Bo Chen Haibin Wu Helen Meng Hung-yi Lee J. Jang AAML 20 3 0 03 Oct 2022
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection Kyle Min Sourya Roy Subarna Tripathi T. Guha Somdeb Majumdar 26 36 0 15 Jul 2022
UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022 Yuanhang Zhang Susan Liang Shuang Yang Shiguang Shan 10 4 0 22 Jun 2022
Rethinking Audio-visual Synchronization for Active Speaker Detection Abudukelimu Wuerkaixi You Zhang Z. Duan Changshui Zhang 18 10 0 21 Jun 2022
End-to-End Active Speaker Detection Juan Carlos León Alcázar M. Cordes Chen Zhao Guohao Li 24 27 0 27 Mar 2022
$Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement$ Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement Jun Xiong Yu Zhou Peng Zhang Lei Xie Wei Huang Yufei Zha 28 20 0 04 Mar 2022
Data standardization for robust lip sync C. Wang 38 0 0 13 Feb 2022
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization Hao Jiang Calvin Murdock V. Ithapu EgoV 27 40 0 06 Jan 2022
Learning Spatial-Temporal Graphs for Active Speaker Detection Sourya Roy Kyle Min Subarna Tripathi T. Guha Somdeb Majumdar 35 3 0 02 Dec 2021
A trained humanoid robot can perform human-like crossmodal social attention and conflict resolution Di Fu Fares Abawi Hugo C. C. Carneiro Matthias Kerzel Ziwei Chen Erik Strahl Xun Liu S. Wermter 17 6 0 02 Nov 2021
MAAS: Multi-modal Assignation for Active Speaker Detection Juan Carlos León Alcázar Fabian Caba Heilbron Ali K. Thabet Guohao Li 65 51 0 11 Jan 2021
VoxCeleb2: Deep Speaker Recognition Joon Son Chung Arsha Nagrani Andrew Zisserman 251 2,233 0 14 Jun 2018
Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation Daniel Stoller Sebastian Ewert S. Dixon AI4TS 104 589 0 08 Jun 2018