Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1806.05622
Cited By
VoxCeleb2: Deep Speaker Recognition
14 June 2018
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VoxCeleb2: Deep Speaker Recognition"
50 / 774 papers shown
Title
An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits
Kai Li
Fenghua Xie
Hang Chen
K. Yuan
Xiaolin Hu
34
14
0
21 Dec 2022
A Review of Speech-centric Trustworthy Machine Learning: Privacy, Safety, and Fairness
Tiantian Feng
Rajat Hebbar
Nicholas Mehlman
Xuan Shi
Aditya Kommineni
and Shrikanth Narayanan
43
31
0
18 Dec 2022
MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation
Bo Zhang
Chenyang Qi
Pan Zhang
Bo Zhang
Hsiang-Tao Wu
Dong Chen
Qifeng Chen
Yong Wang
Fang Wen
29
54
0
15 Dec 2022
Disentangling Prosody Representations with Unsupervised Speech Reconstruction
Leyuan Qu
Taiha Li
C. Weber
Theresa Pekarek-Rosin
F. Ren
S. Wermter
21
8
0
14 Dec 2022
PV3D: A 3D Generative Model for Portrait Video Generation
Eric Xu
Jianfeng Zhang
Jun Hao Liew
Wenqing Zhang
Song Bai
Jiashi Feng
Mike Zheng Shou
VGen
34
20
0
13 Dec 2022
Jointly Learning Visual and Auditory Speech Representations from Raw Data
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
M. Pantic
SSL
45
48
0
12 Dec 2022
Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers
Yasheng Sun
Hang Zhou
Kaisiyuan Wang
Qianyi Wu
Zhibin Hong
Jingtuo Liu
Errui Ding
Jingdong Wang
Ziwei Liu
Koike Hideki
35
34
0
09 Dec 2022
Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors
Zhentao Yu
Zixin Yin
Deyu Zhou
Duomin Wang
Finn Wong
Baoyuan Wang
DiffM
30
35
0
07 Dec 2022
Covariance Regularization for Probabilistic Linear Discriminant Analysis
Zhiyuan Peng
Mingjie Shao
Xuanji He
Xu Li
Tan Lee
Ke Ding
Guanglu Wan
12
1
0
06 Dec 2022
FakeOut: Leveraging Out-of-domain Self-supervision for Multi-modal Video Deepfake Detection
Gil Knafo
Ohad Fried
28
5
0
01 Dec 2022
Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection
Rahul Sharma
Shrikanth Narayanan
37
8
0
01 Dec 2022
Hiding speaker's sex in speech using zero-evidence speaker representation in an analysis/synthesis pipeline
Paul-Gauthier Noé
Xiaoxiao Miao
Xin Wang
Junichi Yamagishi
J. Bonastre
D. Matrouf
21
7
0
29 Nov 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Xixi Hu
Ziyang Chen
Andrew Owens
28
51
0
28 Nov 2022
Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis
Duomin Wang
Yu Deng
Zixin Yin
H. Shum
Baoyuan Wang
16
60
0
26 Nov 2022
Pose-disentangled Contrastive Learning for Self-supervised Facial Representation
Y. Liu
Wenbin Wang
Yibing Zhan
Shaoze Feng
Li-Yu Daisy Liu
Zhe Chen
SSL
24
13
0
24 Nov 2022
A new Speech Feature Fusion method with cross gate parallel CNN for Speaker Recognition
Jiacheng Zhang
Wenyi Yan
Ye Zhang
20
2
0
24 Nov 2022
Complex-Valued Time-Frequency Self-Attention for Speech Dereverberation
Vinay Kothapally
John H. L. Hansen
31
9
0
22 Nov 2022
Robust Training for Speaker Verification against Noisy Labels
Zhihua Fang
Liang He
Hanhan Ma
Xiao-Min Guo
Lin Li
NoLa
24
3
0
22 Nov 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
33
37
0
21 Nov 2022
Multi-source Domain Adaptation for Text-independent Forensic Speaker Recognition
Zhenyu Wang
John H. L. Hansen
36
21
0
17 Nov 2022
SPACE: Speech-driven Portrait Animation with Controllable Expression
Francesco Ferroni
Arun Mallya
Ting-Chun Wang
Rafael Valle
Xuan Li
VGen
34
45
0
17 Nov 2022
Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints
Zhichao Wang
Xinsheng Wang
Linfu Xie
Yuan-Jui Chen
Qiao Tian
Yuping Wang
25
5
0
16 Nov 2022
Array Configuration-Agnostic Personalized Speech Enhancement using Long-Short-Term Spatial Coherence
Yicheng Hsu
Yonghan Lee
M. Bai
27
2
0
16 Nov 2022
Multi-Label Training for Text-Independent Speaker Identification
Yuqi Xue
27
0
0
14 Nov 2022
Towards A Unified Conformer Structure: from ASR to ASV Task
Dexin Liao
Tao Jiang
Feng Wang
Lin Li
Q. Hong
30
10
0
14 Nov 2022
Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization
Federico Landini
Mireia Díez
Alicia Lozano-Diez
L. Burget
37
15
0
12 Nov 2022
MARLIN: Masked Autoencoder for facial video Representation LearnINg
Zhixi Cai
Shreya Ghosh
Kalin Stefanov
Abhinav Dhall
Jianfei Cai
Hamid Rezatofighi
Reza Haffari
Munawar Hayat
ViT
CVBM
27
60
0
12 Nov 2022
Low Pass Filtering and Bandwidth Extension for Robust Anti-spoofing Countermeasure Against Codec Variabilities
Yikang Wang
Xingming Wang
Hiromitsu Nishizaki
Ming Li
24
6
0
12 Nov 2022
Speech separation with large-scale self-supervised learning
Zhuo Chen
Naoyuki Kanda
Jian Wu
Yu-Huan Wu
Xiaofei Wang
Takuya Yoshioka
Jinyu Li
S. Sivasankaran
Sefik Emre Eskimez
19
14
0
09 Nov 2022
BER: Balanced Error Rate For Speaker Diarization
Tao Liu
K. Yu
20
4
0
08 Nov 2022
Pushing the limits of self-supervised speaker verification using regularized distillation framework
Yafeng Chen
Siqi Zheng
Haibo Wang
Luyao Cheng
Qian Chen
20
24
0
08 Nov 2022
High-resolution embedding extractor for speaker diarisation
Hee-Soo Heo
Youngki Kwon
Bong-Jin Lee
You Jin Kim
Jee-weon Jung
32
5
0
08 Nov 2022
Dynamic Kernels and Channel Attention for Low Resource Speaker Verification
A. Ollerenshaw
Md. Asif Jalal
Thomas Hain
19
0
0
03 Nov 2022
Convolution channel separation and frequency sub-bands aggregation for music genre classification
Ju-Sung Heo
Hyun-Seo Shin
Ju-ho Kim
Chan-yeong Lim
Ha-Jin Yu
16
1
0
03 Nov 2022
Late Audio-Visual Fusion for In-The-Wild Speaker Diarization
Zexu Pan
Gordon Wichern
François Germain
Aswin Shanmugam Subramanian
Jonathan Le Roux
VGen
21
1
0
02 Nov 2022
Autoregressive GAN for Semantic Unconditional Head Motion Generation
Louis Airale
Xavier Alameda-Pineda
Stéphane Lathuilière
Dominique Vaufreydaz
25
3
0
02 Nov 2022
LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification
Xingqi Chen
Jie Wang
Xiaoli Zhang
Weiqiang Zhang
Kunde Yang
AAML
26
7
0
02 Nov 2022
Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022
Zhengyang Chen
Bing Han
Xu Xiang
Houjun Huang
Bei Liu
Y. Qian
32
13
0
02 Nov 2022
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Karolos Nikitaras
Konstantinos Klapsas
Nikolaos Ellinas
Georgia Maniati
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
14
0
0
01 Nov 2022
A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings
Mohan Shi
Jie Zhang
Zhihao Du
Fan Yu
Qian Chen
Shiliang Zhang
Lirong Dai
51
4
0
01 Nov 2022
Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
Zili Huang
Desh Raj
Leibny Paola García-Perera
Sanjeev Khudanpur
86
23
0
01 Nov 2022
Metric Learning for User-defined Keyword Spotting
Jaemin Jung
You-kyong. Kim
Jihwan Park
Youshin Lim
Byeong-Yeol Kim
Youngjoon Jang
Joon Son Chung
40
9
0
01 Nov 2022
Disentangled representation learning for multilingual speaker recognition
Kihyun Nam
You-kyong. Kim
Jaesung Huh
Hee-Soo Heo
Jee-weon Jung
Joon Son Chung
53
6
0
01 Nov 2022
ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting
Zexu Pan
Wupeng Wang
Marvin Borsdorf
Haizhou Li
14
10
0
31 Oct 2022
Model Compression for DNN-based Speaker Verification Using Weight Quantization
Jingyu Li
W. Liu
Zhaoyang Zhang
Jiong Wang
Tan Lee
MQ
24
3
0
31 Oct 2022
Convolution-Based Channel-Frequency Attention for Text-Independent Speaker Verification
Jingyu Li
Yusheng Tian
Tan Lee
30
9
0
31 Oct 2022
Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection
L. Attorresi
Davide Salvi
Clara Borrelli
Paolo Bestagini
Stefano Tubaro
18
22
0
31 Oct 2022
Target-Speaker Voice Activity Detection via Sequence-to-Sequence Prediction
Ming Cheng
Weiqing Wang
Yucong Zhang
Xiaoyi Qin
Ming Li
VLM
56
32
0
28 Oct 2022
Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters
Junyi Peng
Themos Stafylakis
Rongzhi Gu
Oldvrich Plchot
Ladislav Movsner
Lukávs Burget
JanHonza'' vCernocký
42
22
0
28 Oct 2022
Laugh Betrays You? Learning Robust Speaker Representation From Speech Containing Non-Verbal Fragments
Yuke Lin
Xiaoyi Qin
Huahua Cui
Zhenyi Zhu
Ming Li
16
1
0
28 Oct 2022
Previous
1
2
3
...
9
10
11
...
14
15
16
Next