Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1901.01342
Cited By
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
5 January 2019
Joseph Roth
Sourish Chaudhuri
Ondˇrej Klejch
Radhika Marvin
Andrew C. Gallagher
Liat Kaver
S. Ramaswamy
Arkadiusz Stopczynski
Cordelia Schmid
Zhonghua Xi
C. Pantofaru
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection"
46 / 46 papers shown
Title
Audio-Visual Talker Localization in Video for Spatial Sound Reproduction
Davide Berghi
Philip J. B. Jackson
50
0
0
01 Jun 2024
Robust Active Speaker Detection in Noisy Environments
Siva Sai Nagender Vasireddy
Chenxu Zhang
Xiaohu Guo
Yapeng Tian
40
0
0
27 Mar 2024
Audio-visual child-adult speaker classification in dyadic interactions
Anfeng Xu
Kevin Huang
Tiantian Feng
Helen Tager-Flusberg
Shrikanth Narayanan
23
3
0
03 Oct 2023
Target Active Speaker Detection with Audio-visual Cues
Yiding Jiang
Ruijie Tao
Zexu Pan
Haizhou Li
28
16
0
22 May 2023
A multimodal dynamical variational autoencoder for audiovisual speech representation learning
Samir Sadok
Simon Leglaive
Laurent Girin
Xavier Alameda-Pineda
Renaud Séguier
38
11
0
05 May 2023
Word-level Persian Lipreading Dataset
J. Peymanfard
Ali Lashini
Samin Heydarian
Hossein Zeinali
N. Mozayani
33
5
0
08 Apr 2023
Egocentric Auditory Attention Localization in Conversations
Fiona Ryan
Hao Jiang
Abhinav Shukla
James M. Rehg
V. Ithapu
EgoV
31
16
0
28 Mar 2023
WASD: A Wilder Active Speaker Detection Dataset
Tiago Roxo
Joana Cabral Costa
Pedro R. M. Inácio
Hugo Manuel Proença
24
3
0
09 Mar 2023
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset
J. Peymanfard
Samin Heydarian
Ali Lashini
Hossein Zeinali
Mohammad Reza Mohammadi
N. Mozayani
32
10
0
21 Jan 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection
Xizi Wang
Feng Cheng
Gedas Bertasius
David J. Crandall
26
15
0
19 Jan 2023
Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection
Rahul Sharma
Shrikanth Narayanan
39
8
0
01 Dec 2022
No-audio speaking status detection in crowded settings via visual pose-based filtering and wearable acceleration
Jose Vargas-Quiros
Laura Cabrera-Quiros
Hayley Hung
29
1
0
01 Nov 2022
Unsupervised active speaker detection in media content using cross-modal information
Rahul Sharma
Shrikanth Narayanan
24
3
0
24 Sep 2022
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
46
55
0
20 Aug 2022
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection
Kyle Min
Sourya Roy
Subarna Tripathi
T. Guha
Somdeb Majumdar
26
36
0
15 Jul 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Chuang Gan
Yi Gu
Siyuan Zhou
Jeremy Schwartz
S. Alter
James Traer
Dan Gutfreund
J. Tenenbaum
Josh H. McDermott
Antonio Torralba
57
19
0
07 Jul 2022
UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022
Yuanhang Zhang
Susan Liang
Shuang Yang
Shiguang Shan
10
4
0
22 Jun 2022
Rethinking Audio-visual Synchronization for Active Speaker Detection
Abudukelimu Wuerkaixi
You Zhang
Z. Duan
Changshui Zhang
18
10
0
21 Jun 2022
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
SSL
38
131
0
18 Jun 2022
End-to-end multi-talker audio-visual ASR using an active speaker attention module
R. Rose
Olivier Siohan
18
3
0
01 Apr 2022
End-to-End Active Speaker Detection
Juan Carlos León Alcázar
M. Cordes
Chen Zhao
Guohao Li
24
27
0
27 Mar 2022
Visually Supervised Speaker Detection and Localization via Microphone Array
Davide Berghi
A. Hilton
Philip J. B. Jackson
27
11
0
07 Mar 2022
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
Hao Jiang
Calvin Murdock
V. Ithapu
EgoV
34
41
0
06 Jan 2022
Learning Spatial-Temporal Graphs for Active Speaker Detection
Sourya Roy
Kyle Min
Subarna Tripathi
T. Guha
Somdeb Majumdar
40
3
0
02 Dec 2021
A trained humanoid robot can perform human-like crossmodal social attention and conflict resolution
Di Fu
Fares Abawi
Hugo C. C. Carneiro
Matthias Kerzel
Ziwei Chen
Erik Strahl
Xun Liu
S. Wermter
17
6
0
02 Nov 2021
Sub-word Level Lip Reading With Visual Attention
Prajwal K R
Triantafyllos Afouras
Andrew Zisserman
17
92
0
14 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
275
1,026
0
13 Oct 2021
FaVoA: Face-Voice Association Favours Ambiguous Speaker Detection
Hugo C. C. Carneiro
C. Weber
S. Wermter
CVBM
31
7
0
01 Sep 2021
Look Who's Talking: Active Speaker Detection in the Wild
You Jin Kim
Hee-Soo Heo
Soyeon Choe
Soo-Whan Chung
Yoohwan Kwon
Bong-Jin Lee
Youngki Kwon
Joon Son Chung
52
20
0
17 Aug 2021
The Right to Talk: An Audio-Visual Transformer Approach
Thanh-Dat Truong
C. Duong
T. D. Vu
H. Pham
Bhiksha Raj
Ngan Le
Khoa Luu
63
36
0
06 Aug 2021
UniCon: Unified Context Network for Robust Active Speaker Detection
Yuanhang Zhang
Susan Liang
Shuang Yang
Xiao-Chang Liu
Zhongqin Wu
Shiguang Shan
Xilin Chen
CVBM
29
36
0
05 Aug 2021
Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection
Ruijie Tao
Zexu Pan
Rohan Kumar Das
Xinyuan Qian
Mike Zheng Shou
Haizhou Li
22
176
0
14 Jul 2021
How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild
Okan Kopuklu
Maja Taseska
Gerhard Rigoll
3DV
29
45
0
07 Jun 2021
Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion
Baptiste Pouthier
L. Pilati
Leela K. Gudupudi
C. Bouveyron
F. Precioso
25
11
0
07 Jun 2021
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
Sangho Lee
Jiwan Chung
Youngjae Yu
Gunhee Kim
Thomas Breuel
Gal Chechik
Yale Song
71
45
0
26 Jan 2021
MAAS: Multi-modal Assignation for Active Speaker Detection
Juan Carlos León Alcázar
Fabian Caba Heilbron
Ali K. Thabet
Guohao Li
65
51
0
11 Jan 2021
HAA500: Human-Centric Atomic Action Dataset with Curated Videos
Jihoon Chung
Cheng-hsin Wuu
Hsuan-ru Yang
Yu-Wing Tai
Chi-Keung Tang
21
43
0
11 Sep 2020
Self-Supervised Learning of Audio-Visual Objects from Video
Triantafyllos Afouras
Andrew Owens
Joon Son Chung
Andrew Zisserman
SSL
19
253
0
10 Aug 2020
A Unified Framework for Shot Type Classification Based on Subject Centric Lens
Anyi Rao
Jiaze Wang
Linning Xu
Xuekun Jiang
Qingqiu Huang
Bolei Zhou
Dahua Lin
18
60
0
08 Aug 2020
Online Multi-modal Person Search in Videos
J. Xia
Anyi Rao
Qingqiu Huang
Linning Xu
Jiangtao Wen
Dahua Lin
28
28
0
08 Aug 2020
MovieNet: A Holistic Dataset for Movie Understanding
Qingqiu Huang
Yu Xiong
Anyi Rao
Jiaze Wang
Dahua Lin
VGen
40
234
0
21 Jul 2020
Rescaling Egocentric Vision
Dima Damen
Hazel Doughty
G. Farinella
Antonino Furnari
Evangelos Kazakos
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
19
437
0
23 Jun 2020
Active Speakers in Context
Juan Carlos León Alcázar
Fabian Caba Heilbron
Long Mai
Federico Perazzi
Joon-Young Lee
Pablo Arbelaez
Guohao Li
32
61
0
20 May 2020
A Local-to-Global Approach to Multi-modal Movie Scene Segmentation
Anyi Rao
Linning Xu
Yu Xiong
Guodong Xu
Qingqiu Huang
Bolei Zhou
Dahua Lin
33
111
0
06 Apr 2020
Multimodal active speaker detection and virtual cinematography for video conferencing
Ross Cutler
Ramin Mehran
Sam Johnson
Cha Zhang
Adam G. Kirk
Oliver Whyte
Adarsh Kowdle
18
7
0
10 Feb 2020
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
31
156
0
14 Jan 2020
1