Cross-modal Supervision for Learning Active Speaker Detection in Video

Cross-modal Supervision for Learning Active Speaker Detection in Video

29 March 2016

Punarjay Chakravarty

Tinne Tuytelaars

Papers citing "Cross-modal Supervision for Learning Active Speaker Detection in Video"

13 / 13 papers shown

Title
WASD: A Wilder Active Speaker Detection Dataset Tiago Roxo Joana Cabral Costa Pedro R. M. Inácio Hugo Manuel Proença 24 3 0 09 Mar 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection Xizi Wang Feng Cheng Gedas Bertasius David J. Crandall 26 15 0 19 Jan 2023
Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection Rahul Sharma Shrikanth Narayanan 42 8 0 01 Dec 2022
Unsupervised active speaker detection in media content using cross-modal information Rahul Sharma Shrikanth Narayanan 32 3 0 24 Sep 2022
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection Kyle Min Sourya Roy Subarna Tripathi T. Guha Somdeb Majumdar 26 36 0 15 Jul 2022
Learning Spatial-Temporal Graphs for Active Speaker Detection Sourya Roy Kyle Min Subarna Tripathi T. Guha Somdeb Majumdar 43 3 0 02 Dec 2021
The Right to Talk: An Audio-Visual Transformer Approach Thanh-Dat Truong C. Duong T. D. Vu H. Pham Bhiksha Raj Ngan Le Khoa Luu 63 36 0 06 Aug 2021
UniCon: Unified Context Network for Robust Active Speaker Detection Yuanhang Zhang Susan Liang Shuang Yang Xiao-Chang Liu Zhongqin Wu Shiguang Shan Xilin Chen CVBM 29 36 0 05 Aug 2021
Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection Ruijie Tao Zexu Pan Rohan Kumar Das Xinyuan Qian Mike Zheng Shou Haizhou Li 27 176 0 14 Jul 2021
Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion Baptiste Pouthier L. Pilati Leela K. Gudupudi C. Bouveyron F. Precioso 30 11 0 07 Jun 2021
Self-Supervised Learning of Audio-Visual Objects from Video Triantafyllos Afouras Andrew Owens Joon Son Chung Andrew Zisserman SSL 19 253 0 10 Aug 2020
Multimodal active speaker detection and virtual cinematography for video conferencing Ross Cutler Ramin Mehran Sam Johnson Cha Zhang Adam G. Kirk Oliver Whyte Adarsh Kowdle 26 7 0 10 Feb 2020
VoxCeleb: a large-scale speaker identification dataset Arsha Nagrani Joon Son Chung Andrew Zisserman 49 2,252 0 26 Jun 2017