Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1603.08907
Cited By
Cross-modal Supervision for Learning Active Speaker Detection in Video
29 March 2016
Punarjay Chakravarty
Tinne Tuytelaars
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Cross-modal Supervision for Learning Active Speaker Detection in Video"
13 / 13 papers shown
Title
WASD: A Wilder Active Speaker Detection Dataset
Tiago Roxo
Joana Cabral Costa
Pedro R. M. Inácio
Hugo Manuel Proença
24
3
0
09 Mar 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection
Xizi Wang
Feng Cheng
Gedas Bertasius
David J. Crandall
26
15
0
19 Jan 2023
Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection
Rahul Sharma
Shrikanth Narayanan
42
8
0
01 Dec 2022
Unsupervised active speaker detection in media content using cross-modal information
Rahul Sharma
Shrikanth Narayanan
32
3
0
24 Sep 2022
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection
Kyle Min
Sourya Roy
Subarna Tripathi
T. Guha
Somdeb Majumdar
26
36
0
15 Jul 2022
Learning Spatial-Temporal Graphs for Active Speaker Detection
Sourya Roy
Kyle Min
Subarna Tripathi
T. Guha
Somdeb Majumdar
43
3
0
02 Dec 2021
The Right to Talk: An Audio-Visual Transformer Approach
Thanh-Dat Truong
C. Duong
T. D. Vu
H. Pham
Bhiksha Raj
Ngan Le
Khoa Luu
63
36
0
06 Aug 2021
UniCon: Unified Context Network for Robust Active Speaker Detection
Yuanhang Zhang
Susan Liang
Shuang Yang
Xiao-Chang Liu
Zhongqin Wu
Shiguang Shan
Xilin Chen
CVBM
29
36
0
05 Aug 2021
Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection
Ruijie Tao
Zexu Pan
Rohan Kumar Das
Xinyuan Qian
Mike Zheng Shou
Haizhou Li
27
176
0
14 Jul 2021
Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion
Baptiste Pouthier
L. Pilati
Leela K. Gudupudi
C. Bouveyron
F. Precioso
30
11
0
07 Jun 2021
Self-Supervised Learning of Audio-Visual Objects from Video
Triantafyllos Afouras
Andrew Owens
Joon Son Chung
Andrew Zisserman
SSL
19
253
0
10 Aug 2020
Multimodal active speaker detection and virtual cinematography for video conferencing
Ross Cutler
Ramin Mehran
Sam Johnson
Cha Zhang
Adam G. Kirk
Oliver Whyte
Adarsh Kowdle
26
7
0
10 Feb 2020
VoxCeleb: a large-scale speaker identification dataset
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
49
2,252
0
26 Jun 2017
1