ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.05622
  4. Cited By
VoxCeleb2: Deep Speaker Recognition

VoxCeleb2: Deep Speaker Recognition

14 June 2018
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
ArXivPDFHTML

Papers citing "VoxCeleb2: Deep Speaker Recognition"

50 / 773 papers shown
Title
Large-scale unsupervised audio pre-training for video-to-speech
  synthesis
Large-scale unsupervised audio pre-training for video-to-speech synthesis
Triantafyllos Kefalas
Yannis Panagakis
M. Pantic
VGen
32
3
0
27 Jun 2023
3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and
  Multi-Dialect Corpus for Speech Representation Disentanglement
3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Siqi Zheng
Luyao Cheng
Yafeng Chen
Haibo Wang
Qian Chen
16
16
0
27 Jun 2023
AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker
  Extraction
AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction
Jiuxin Lin
X. Cai
Heinrich Dinkel
Jun Chen
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Zhiyong Wu
Yujun Wang
Helen M. Meng
22
21
0
25 Jun 2023
Audio-Driven 3D Facial Animation from In-the-Wild Videos
Audio-Driven 3D Facial Animation from In-the-Wild Videos
Liying Lu
Tianke Zhang
Yunfei Liu
Xuangeng Chu
Yu Li
VGen
47
3
0
20 Jun 2023
Factors Affecting the Performance of Automated Speaker Verification in
  Alzheimer's Disease Clinical Trials
Factors Affecting the Performance of Automated Speaker Verification in Alzheimer's Disease Clinical Trials
Malikeh Ehghaghi
Marija Stanojevic
Ali Akram
Jekaterina Novikova
11
1
0
20 Jun 2023
Emotional Speech-Driven Animation with Content-Emotion Disentanglement
Emotional Speech-Driven Animation with Content-Emotion Disentanglement
Radek Danvevcek
Kiran Chhatre
Shashank Tripathi
Yandong Wen
Michael J. Black
Timo Bolkart
16
66
0
15 Jun 2023
Parametric Implicit Face Representation for Audio-Driven Facial
  Reenactment
Parametric Implicit Face Representation for Audio-Driven Facial Reenactment
Ricong Huang
Puxiang Lai
Yipeng Qin
Guanbin Li
CVBM
DiffM
27
14
0
13 Jun 2023
Speaker Verification Across Ages: Investigating Deep Speaker Embedding
  Sensitivity to Age Mismatch in Enrollment and Test Speech
Speaker Verification Across Ages: Investigating Deep Speaker Embedding Sensitivity to Age Mismatch in Enrollment and Test Speech
Vishwanath Pratap Singh
Md. Sahidullah
Tomi Kinnunen
18
3
0
13 Jun 2023
NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake
  Detection
NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake Detection
Yu Chen
Yang Yu
R. Ni
Yao-Min Zhao
Haoliang Li
36
2
0
12 Jun 2023
Audio-Visual Speech Enhancement With Selective Off-Screen Speech
  Extraction
Audio-Visual Speech Enhancement With Selective Off-Screen Speech Extraction
Tomoya Yoshinaga
Keitaro Tanaka
Shigeo Morishima
30
0
0
10 Jun 2023
OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality
  Alignment
OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment
Xize Cheng
Tao Jin
Lin Li
Wang Lin
Xinyu Duan
Zhou Zhao
VLM
21
15
0
10 Jun 2023
IFaceUV: Intuitive Motion Facial Image Generation by Identity
  Preservation via UV map
IFaceUV: Intuitive Motion Facial Image Generation by Identity Preservation via UV map
Han-Lim Lee
Yu-Te Ku
Eunseok Kim
Seungryul Baek
3DH
33
0
0
08 Jun 2023
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
Yochai Yemini
Aviv Shamsian
Lior Bracha
Sharon Gannot
Ethan Fetaya
DiffM
27
11
0
05 Jun 2023
End-to-End Joint Target and Non-Target Speakers ASR
End-to-End Joint Target and Non-Target Speakers ASR
Ryo Masumura
Naoki Makishima
Taiga Yamane
Yoshihiko Yamazaki
Saki Mizuno
...
Akihiko Takashima
Satoshi Suzuki
Takafumi Moriya
Nobukatsu Hojo
Atsushi Ando
27
5
0
04 Jun 2023
ALO-VC: Any-to-any Low-latency One-shot Voice Conversion
ALO-VC: Any-to-any Low-latency One-shot Voice Conversion
Bo Wang
Damien Ronssin
Milos Cernak
BDL
25
3
0
01 Jun 2023
Meta-Learning Framework for End-to-End Imposter Identification in Unseen
  Speaker Recognition
Meta-Learning Framework for End-to-End Imposter Identification in Unseen Speaker Recognition
Ashutosh Chaubey
Sparsh Sinha
Susmita Ghose
19
0
0
01 Jun 2023
Speech inpainting: Context-based speech synthesis guided by video
Speech inpainting: Context-based speech synthesis guided by video
Juan F. Montesinos
Daniel Michelsanti
G. Haro
Zheng-Hua Tan
Jesper Jensen
21
3
0
01 Jun 2023
Speaker verification using attentive multi-scale convolutional recurrent
  network
Speaker verification using attentive multi-scale convolutional recurrent network
Yanxiong Li
Zhongjie Jiang
Wenchang Cao
Qisheng Huang
27
8
0
01 Jun 2023
How to Construct Perfect and Worse-than-Coin-Flip Spoofing
  Countermeasures: A Word of Warning on Shortcut Learning
How to Construct Perfect and Worse-than-Coin-Flip Spoofing Countermeasures: A Word of Warning on Shortcut Learning
Hye-jin Shim
Rosa González Hautamäki
Md. Sahidullah
Tomi Kinnunen
AAML
19
5
0
31 May 2023
Intelligible Lip-to-Speech Synthesis with Speech Units
Intelligible Lip-to-Speech Synthesis with Speech Units
J. Choi
Minsu Kim
Y. Ro
29
24
0
31 May 2023
Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation
Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation
Se Jin Park
Minsu Kim
J. Choi
Y. Ro
CVBM
27
4
0
31 May 2023
Context-Preserving Two-Stage Video Domain Translation for Portrait
  Stylization
Context-Preserving Two-Stage Video Domain Translation for Portrait Stylization
Doyeon Kim
Eunji Ko
Hyunsung Kim
Yunji Kim
Junho Kim
Dong Min
Junmo Kim
Sung Ju Hwang
DiffM
VGen
41
1
0
30 May 2023
Towards single integrated spoofing-aware speaker verification embeddings
Towards single integrated spoofing-aware speaker verification embeddings
Sung Hwan Mun
Hye-jin Shim
Hemlata Tak
Xin Wang
Xuechen Liu
...
Junichi Yamagishi
Nicholas W. D. Evans
Tomi Kinnunen
N. Kim
Jee-weon Jung
46
11
0
30 May 2023
Speaker anonymization using orthogonal Householder neural network
Speaker anonymization using orthogonal Householder neural network
Xiaoxiao Miao
Xin Wang
Erica Cooper
Junichi Yamagishi
N. Tomashenko
BDL
26
18
0
30 May 2023
An Experimental Review of Speaker Diarization methods with application
  to Two-Speaker Conversational Telephone Speech recordings
An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings
L. Serafini
Samuele Cornell
Giovanni Morrone
Enrico Zovato
A. Brutti
S. Squartini
47
9
0
29 May 2023
One-Step Knowledge Distillation and Fine-Tuning in Using Large
  Pre-Trained Self-Supervised Learning Models for Speaker Verification
One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification
Ju-Sung Heo
Chan-yeong Lim
Ju-ho Kim
Hyun-Seo Shin
Ha-Jin Yu
29
2
0
27 May 2023
CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition
CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition
Lantian Li
Xiaolou Li
Haoyu Jiang
Cheng Chen
Ruihai Hou
Dong Wang
SLR
11
5
0
25 May 2023
DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled
  Representation and Prior Mixup for Verified Robust Voice Conversion
DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion
Haram Choi
Sang-Hoon Lee
Seong-Whan Lee
DiffM
18
27
0
25 May 2023
P-vectors: A Parallel-Coupled TDNN/Transformer Network for Speaker
  Verification
P-vectors: A Parallel-Coupled TDNN/Transformer Network for Speaker Verification
Xiyuan Wang
Fangyuan Wang
Bo Xu
Liang Xu
Jing Xiao
21
6
0
24 May 2023
QFA2SR: Query-Free Adversarial Transfer Attacks to Speaker Recognition
  Systems
QFA2SR: Query-Free Adversarial Transfer Attacks to Speaker Recognition Systems
Guangke Chen
Yedi Zhang
Zhe Zhao
Fu Song
AAML
41
11
0
23 May 2023
An Enhanced Res2Net with Local and Global Feature Fusion for Speaker
  Verification
An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification
Yafeng Chen
Siqi Zheng
Haibo Wang
Luyao Cheng
Qian Chen
Jiajun Qi
24
38
0
22 May 2023
Progressive Sub-Graph Clustering Algorithm for Semi-Supervised Domain
  Adaptation Speaker Verification
Progressive Sub-Graph Clustering Algorithm for Semi-Supervised Domain Adaptation Speaker Verification
Zhuo Li
Jingze Lu
Z. Zhao
Wenchao Wang
Pengyuan Zhang
27
1
0
22 May 2023
The HCCL system for VoxCeleb Speaker Recognition Challenge 2022
The HCCL system for VoxCeleb Speaker Recognition Challenge 2022
Zhenduo Zhao
Zhuo Li
Wenchao Wang
Pengyuan Zhang
22
4
0
22 May 2023
LPMM: Intuitive Pose Control for Neural Talking-Head Model via
  Landmark-Parameter Morphable Model
LPMM: Intuitive Pose Control for Neural Talking-Head Model via Landmark-Parameter Morphable Model
K. Lee
Patrick Kwon
Myung Ki Lee
Namhyuk Ahn
Junsoo Lee
9
1
0
17 May 2023
Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice
  Conversion
Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion
Zhichao Wang
Liumeng Xue
Qiuqiang Kong
Linfu Xie
Yuan-Jui Chen
Qiao Tian
Yuping Wang
BDL
17
3
0
12 May 2023
DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head
  Video Generation
DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation
Fa-Ting Hong
Li Shen
Dan Xu
3DH
CVBM
21
15
0
10 May 2023
StyleSync: High-Fidelity Generalized and Personalized Lip Sync in
  Style-based Generator
StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator
Jiazhi Guan
Zhanwang Zhang
Hang Zhou
Tianshu Hu
Kaisiyuan Wang
...
Haocheng Feng
Jingtuo Liu
Errui Ding
Ziwei Liu
Jingdong Wang
42
57
0
09 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by
  Unsupervised Learning from Voice Recordings
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
34
1
0
09 May 2023
Zero-shot personalized lip-to-speech synthesis with face image based
  voice control
Zero-shot personalized lip-to-speech synthesis with face image based voice control
Zheng-Yan Sheng
Yang Ai
Zhenhua Ling
CVBM
27
5
0
09 May 2023
Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head
  Videos
Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos
Ekta Prashnani
Koki Nagano
Shalini De Mello
D. Luebke
Orazio Gallo
31
2
0
05 May 2023
A vector quantized masked autoencoder for audiovisual speech emotion recognition
A vector quantized masked autoencoder for audiovisual speech emotion recognition
Samir Sadok
Simon Leglaive
Renaud Séguier
SSL
81
6
0
05 May 2023
High-fidelity Generalized Emotional Talking Face Generation with
  Multi-modal Emotion Space Learning
High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning
Chao Xu
Sijun Tan
Jibang Wu
Yue Han
Wenqing Chu
Xiaohui Bei
Chengjie Wang
Haifeng Xu
Yong Liu
CVBM
54
36
0
04 May 2023
Improved Vocal Effort Transfer Vector Estimation for Vocal Effort-Robust
  Speaker Verification
Improved Vocal Effort Transfer Vector Estimation for Vocal Effort-Robust Speaker Verification
Iván López-Espejo
Santi Prieto
Alfonso Ortega
EDUARDO LLEIDA SOLANO
22
0
0
03 May 2023
Glitch in the Matrix: A Large Scale Benchmark for Content Driven
  Audio-Visual Forgery Detection and Localization
Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization
Théophile Cabannes
Shreya Ghosh
Raphaël Marinier
Tom Gedeon
Alexandre M. Bayen
Munawar Hayat
86
22
0
03 May 2023
CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds
CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds
David Budaghyan
Charles C. Onu
Arsenii Gorin
Cem Subakan
Doina Precup
45
6
0
01 May 2023
StyleLipSync: Style-based Personalized Lip-sync Video Generation
StyleLipSync: Style-based Personalized Lip-sync Video Generation
Taekyung Ki
Dong Min
45
11
0
30 Apr 2023
AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction
AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction
Aggelina Chatziagapi
Dimitris Samaras
3DH
CVBM
33
3
0
25 Apr 2023
A vector quantized masked autoencoder for speech emotion recognition
A vector quantized masked autoencoder for speech emotion recognition
Samir Sadok
Simon Leglaive
Renaud Séguier
34
20
0
21 Apr 2023
FIANCEE: Faster Inference of Adversarial Networks via Conditional Early
  Exits
FIANCEE: Faster Inference of Adversarial Networks via Conditional Early Exits
Polina Karpikova
Radionova Ekaterina
A. Yaschenko
Andrei A. Spiridonov
Leonid Kostyushko
Riccardo Fabbricatore
Aleksei Ivakhnenko Samsung AI Center
25
3
0
20 Apr 2023
High-Fidelity and Freely Controllable Talking Head Video Generation
High-Fidelity and Freely Controllable Talking Head Video Generation
Yue Gao
Yuan-yuan Zhou
Jinglu Wang
Xiao Li
Xiang Ming
Yan Lu
VGen
27
35
0
20 Apr 2023
Previous
123...789...141516
Next