Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1804.00326
Cited By
Seeing Voices and Hearing Faces: Cross-modal biometric matching
1 April 2018
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
CVBM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Seeing Voices and Hearing Faces: Cross-modal biometric matching"
43 / 43 papers shown
Title
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
39
4
0
21 Jul 2024
Robust Active Speaker Detection in Noisy Environments
Siva Sai Nagender Vasireddy
Chenxu Zhang
Xiaohu Guo
Yapeng Tian
40
0
0
27 Mar 2024
Biometric Technologies and the Law: Developing a Taxonomy for Guiding Policymakers
Luis Felipe M. Ramos
16
0
0
27 Oct 2023
Audio-Visual Speaker Verification via Joint Cross-Attention
R Gnana Praveen
Jahangir Alam
34
6
0
28 Sep 2023
Speaker Recognition in Realistic Scenario Using Multimodal Data
Saqlain Hussain Shah
M. S. Saeed
Shah Nawaz
Muhammad Haroon Yousaf
CVBM
26
8
0
25 Feb 2023
Jointly Learning Visual and Auditory Speech Representations from Raw Data
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
M. Pantic
SSL
45
48
0
12 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Xixi Hu
Ziyang Chen
Andrew Owens
28
51
0
28 Nov 2022
Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings
Zhihuan Kuang
Shi Zong
Jianbing Zhang
Jiajun Chen
Hongfu Liu
30
4
0
02 Oct 2022
Learning Branched Fusion and Orthogonal Projection for Face-Voice Association
M. S. Saeed
Shah Nawaz
M. H. Khan
S. Javed
Muhammad Haroon Yousaf
Alessio Del Bue
CVBM
27
4
0
22 Aug 2022
Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors
Sindhu B. Hegde
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
CVBM
16
1
0
17 Aug 2022
Show Me Your Face, And I'll Tell You How You Speak
Christen Millerdurai
L. A. Khaliq
Timon Ulrich
CVBM
68
0
0
28 Jun 2022
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Chenglizhao Chen
Mengke Song
Wenfeng Song
Li Guo
Muwei Jian
35
26
0
20 Jun 2022
Sound Localization by Self-Supervised Time Delay Estimation
Ziyang Chen
David Fouhey
Andrew Owens
SSL
27
19
0
26 Apr 2022
Deep Multimodal Guidance for Medical Image Classification
Mayur Mallya
Ghassan Hamarneh
30
14
0
10 Mar 2022
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection
A. Haliassos
Rodrigo Mira
Stavros Petridis
M. Pantic
CVBM
40
126
0
18 Jan 2022
DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering
Shunyu Yao
Ruizhe Zhong
Yichao Yan
Guangtao Zhai
Xiaokang Yang
CVBM
24
90
0
03 Jan 2022
TEAM-Net: Multi-modal Learning for Video Action Recognition with Partial Decoding
Zhengwei Wang
Qi She
A. Smolic
21
9
0
17 Oct 2021
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Francisco Rivera Valverde
Juana Valeria Hurtado
Abhinav Valada
26
72
0
01 Mar 2021
MAAS: Multi-modal Assignation for Active Speaker Detection
Juan Carlos León Alcázar
Fabian Caba Heilbron
Ali K. Thabet
Guohao Li
65
51
0
11 Jan 2021
Cross-modal Center Loss
Longlong Jing
Elahe Vahdani
Jiaxing Tan
Yingli Tian
3DPC
12
4
0
08 Aug 2020
A Transfer Learning Method for Speech Emotion Recognition from Automatic Speech Recognition
Sitong Zhou
Homayoon Beigi
13
18
0
06 Aug 2020
Domain Adaptation without Source Data
Youngeun Kim
Donghyeon Cho
Kyeongtak Han
Priyadarshini Panda
Sungeun Hong
TTA
11
174
0
03 Jul 2020
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
22
141
0
16 Jun 2020
Towards Robust Pattern Recognition: A Review
Xu-Yao Zhang
Cheng-Lin Liu
C. Suen
OOD
HAI
19
102
0
12 Jun 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
Lingyu Zhu
Esa Rahtu
22
23
0
04 Jun 2020
Active Speakers in Context
Juan Carlos León Alcázar
Fabian Caba Heilbron
Long Mai
Federico Perazzi
Joon-Young Lee
Pablo Arbelaez
Guohao Li
32
61
0
20 May 2020
FaceFilter: Audio-visual speech separation using still images
Soo-Whan Chung
Soyeon Choe
Joon Son Chung
Hong-Goo Kang
CVBM
21
66
0
14 May 2020
Cross-modal Speaker Verification and Recognition: A Multilingual Perspective
M. S. Saeed
Shah Nawaz
Pietro Morerio
Arif Mahmood
I. Gallo
Muhammad Haroon Yousaf
Alessio Del Bue
CVBM
26
25
0
28 Apr 2020
Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose
Xianfang Zeng
Yusu Pan
Mengmeng Wang
Jiangning Zhang
Yong Liu
CVBM
17
42
0
29 Mar 2020
Disentangled Speech Embeddings using Cross-modal Self-supervision
Arsha Nagrani
Joon Son Chung
Samuel Albanie
Andrew Zisserman
SSL
21
88
0
20 Feb 2020
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
197
207
0
23 Jan 2020
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
31
156
0
14 Jan 2020
VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge
Joon Son Chung
Arsha Nagrani
Ernesto Coto
Weidi Xie
Mitchell McLaren
D. Reynolds
Andrew Zisserman
18
59
0
05 Dec 2019
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos
Kranti K. Parida
Neeraj Matiyali
T. Guha
Gaurav Sharma
VLM
32
41
0
19 Oct 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
16
332
0
22 Aug 2019
Some Research Problems in Biometrics: The Future Beckons
Arun Ross
Sudipta Banerjee
Cunjian Chen
Anurag Chowdhury
Vahid Mirjalili
Renu Sharma
Thomas Swearingen
Shivangi Yadav
33
50
0
12 May 2019
The Sound of Motions
Hang Zhao
Chuang Gan
Wei-Chiu Ma
Antonio Torralba
17
251
0
11 Apr 2019
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Samuel Albanie
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
CVBM
30
270
0
16 Aug 2018
Disjoint Mapping Network for Cross-modal Matching of Voices and Faces
Yandong Wen
Mahmoud Al Ismail
Weiyang Liu
Bhiksha Raj
Rita Singh
FedML
22
70
0
12 Jul 2018
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
263
2,233
0
14 Jun 2018
Learnable PINs: Cross-Modal Embeddings for Person Identity
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
SSL
41
140
0
02 May 2018
The Sound of Pixels
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
22
529
0
09 Apr 2018
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
176
784
0
16 Nov 2016
1