Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.03932
Cited By
How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild
7 June 2021
Okan Kopuklu
Maja Taseska
Gerhard Rigoll
3DV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild"
27 / 27 papers shown
Title
LASER: Lip Landmark Assisted Speaker Detection for Robustness
Le Thien Phuc Nguyen
Zhuliang Yu
Yong Jae Lee
39
1
0
21 Jan 2025
CLIP-VAD: Exploiting Vision-Language Models for Voice Activity Detection
Andrea Appiani
Cigdem Beyan
CLIP
VLM
28
0
0
18 Oct 2024
Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges
Victoria Mingote
Alfonso Ortega
A. Miguel
Eduardo Lleida
30
0
0
09 Sep 2024
Robust Active Speaker Detection in Noisy Environments
Siva Sai Nagender Vasireddy
Chenxu Zhang
Xiaohu Guo
Yapeng Tian
40
0
0
27 Mar 2024
TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning
Chaeyoung Jung
Suyeon Lee
Kihyun Nam
Kyeongha Rho
You Jin Kim
Youngjoon Jang
Joon Son Chung
20
9
0
21 Sep 2023
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
SSL
EgoV
36
4
0
10 Jul 2023
Target Active Speaker Detection with Audio-visual Cues
Yiding Jiang
Ruijie Tao
Zexu Pan
Haizhou Li
28
16
0
22 May 2023
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
Bolin Lai
Fiona Ryan
Wenqi Jia
Miao Liu
James M. Rehg
EgoV
32
8
0
06 May 2023
Egocentric Auditory Attention Localization in Conversations
Fiona Ryan
Hao Jiang
Abhinav Shukla
James M. Rehg
V. Ithapu
EgoV
29
16
0
28 Mar 2023
WASD: A Wilder Active Speaker Detection Dataset
Tiago Roxo
Joana Cabral Costa
Pedro R. M. Inácio
Hugo Manuel Proença
19
3
0
09 Mar 2023
A Light Weight Model for Active Speaker Detection
Junhua Liao
Haihan Duan
Kanghui Feng
Wanbing Zhao
Yanbing Yang
Liangyin Chen
35
36
0
08 Mar 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection
Xizi Wang
Feng Cheng
Gedas Bertasius
David J. Crandall
26
15
0
19 Jan 2023
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
Chao Feng
Ziyang Chen
Andrew Owens
31
71
0
04 Jan 2023
Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function
Qing Wang
Hang Chen
Yannan Jiang
Zhe Wang
Yuyang Wang
Jun Du
Chin-Hui Lee
16
4
0
26 Oct 2022
Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection
Xuan-Bo Chen
Haibin Wu
Helen Meng
Hung-yi Lee
J. Jang
AAML
20
3
0
03 Oct 2022
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection
Kyle Min
Sourya Roy
Subarna Tripathi
T. Guha
Somdeb Majumdar
26
36
0
15 Jul 2022
UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022
Yuanhang Zhang
Susan Liang
Shuang Yang
Shiguang Shan
10
4
0
22 Jun 2022
Rethinking Audio-visual Synchronization for Active Speaker Detection
Abudukelimu Wuerkaixi
You Zhang
Z. Duan
Changshui Zhang
18
10
0
21 Jun 2022
End-to-End Active Speaker Detection
Juan Carlos León Alcázar
M. Cordes
Chen Zhao
Guohao Li
24
27
0
27 Mar 2022
Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement
Jun Xiong
Yu Zhou
Peng Zhang
Lei Xie
Wei Huang
Yufei Zha
28
20
0
04 Mar 2022
Data standardization for robust lip sync
C. Wang
38
0
0
13 Feb 2022
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
Hao Jiang
Calvin Murdock
V. Ithapu
EgoV
27
40
0
06 Jan 2022
Learning Spatial-Temporal Graphs for Active Speaker Detection
Sourya Roy
Kyle Min
Subarna Tripathi
T. Guha
Somdeb Majumdar
35
3
0
02 Dec 2021
A trained humanoid robot can perform human-like crossmodal social attention and conflict resolution
Di Fu
Fares Abawi
Hugo C. C. Carneiro
Matthias Kerzel
Ziwei Chen
Erik Strahl
Xun Liu
S. Wermter
17
6
0
02 Nov 2021
MAAS: Multi-modal Assignation for Active Speaker Detection
Juan Carlos León Alcázar
Fabian Caba Heilbron
Ali K. Thabet
Guohao Li
65
51
0
11 Jan 2021
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
251
2,233
0
14 Jun 2018
Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation
Daniel Stoller
Sebastian Ewert
S. Dixon
AI4TS
104
589
0
08 Jun 2018
1