ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.05622
  4. Cited By
VoxCeleb2: Deep Speaker Recognition

VoxCeleb2: Deep Speaker Recognition

14 June 2018
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
ArXivPDFHTML

Papers citing "VoxCeleb2: Deep Speaker Recognition"

50 / 774 papers shown
Title
An Audio-Visual Speech Separation Model Inspired by
  Cortico-Thalamo-Cortical Circuits
An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits
Kai Li
Fenghua Xie
Hang Chen
K. Yuan
Xiaolin Hu
34
14
0
21 Dec 2022
A Review of Speech-centric Trustworthy Machine Learning: Privacy,
  Safety, and Fairness
A Review of Speech-centric Trustworthy Machine Learning: Privacy, Safety, and Fairness
Tiantian Feng
Rajat Hebbar
Nicholas Mehlman
Xuan Shi
Aditya Kommineni
and Shrikanth Narayanan
43
31
0
18 Dec 2022
MetaPortrait: Identity-Preserving Talking Head Generation with Fast
  Personalized Adaptation
MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation
Bo Zhang
Chenyang Qi
Pan Zhang
Bo Zhang
Hsiang-Tao Wu
Dong Chen
Qifeng Chen
Yong Wang
Fang Wen
29
54
0
15 Dec 2022
Disentangling Prosody Representations with Unsupervised Speech
  Reconstruction
Disentangling Prosody Representations with Unsupervised Speech Reconstruction
Leyuan Qu
Taiha Li
C. Weber
Theresa Pekarek-Rosin
F. Ren
S. Wermter
21
8
0
14 Dec 2022
PV3D: A 3D Generative Model for Portrait Video Generation
PV3D: A 3D Generative Model for Portrait Video Generation
Eric Xu
Jianfeng Zhang
Jun Hao Liew
Wenqing Zhang
Song Bai
Jiashi Feng
Mike Zheng Shou
VGen
34
20
0
13 Dec 2022
Jointly Learning Visual and Auditory Speech Representations from Raw
  Data
Jointly Learning Visual and Auditory Speech Representations from Raw Data
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
M. Pantic
SSL
45
48
0
12 Dec 2022
Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in
  Transformers
Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers
Yasheng Sun
Hang Zhou
Kaisiyuan Wang
Qianyi Wu
Zhibin Hong
Jingtuo Liu
Errui Ding
Jingdong Wang
Ziwei Liu
Koike Hideki
35
34
0
09 Dec 2022
Talking Head Generation with Probabilistic Audio-to-Visual Diffusion
  Priors
Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors
Zhentao Yu
Zixin Yin
Deyu Zhou
Duomin Wang
Finn Wong
Baoyuan Wang
DiffM
30
35
0
07 Dec 2022
Covariance Regularization for Probabilistic Linear Discriminant Analysis
Covariance Regularization for Probabilistic Linear Discriminant Analysis
Zhiyuan Peng
Mingjie Shao
Xuanji He
Xu Li
Tan Lee
Ke Ding
Guanglu Wan
12
1
0
06 Dec 2022
FakeOut: Leveraging Out-of-domain Self-supervision for Multi-modal Video
  Deepfake Detection
FakeOut: Leveraging Out-of-domain Self-supervision for Multi-modal Video Deepfake Detection
Gil Knafo
Ohad Fried
28
5
0
01 Dec 2022
Audio-Visual Activity Guided Cross-Modal Identity Association for Active
  Speaker Detection
Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection
Rahul Sharma
Shrikanth Narayanan
37
8
0
01 Dec 2022
Hiding speaker's sex in speech using zero-evidence speaker
  representation in an analysis/synthesis pipeline
Hiding speaker's sex in speech using zero-evidence speaker representation in an analysis/synthesis pipeline
Paul-Gauthier Noé
Xiaoxiao Miao
Xin Wang
Junichi Yamagishi
J. Bonastre
D. Matrouf
21
7
0
29 Nov 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Mix and Localize: Localizing Sound Sources in Mixtures
Xixi Hu
Ziyang Chen
Andrew Owens
28
51
0
28 Nov 2022
Progressive Disentangled Representation Learning for Fine-Grained
  Controllable Talking Head Synthesis
Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis
Duomin Wang
Yu Deng
Zixin Yin
H. Shum
Baoyuan Wang
16
60
0
26 Nov 2022
Pose-disentangled Contrastive Learning for Self-supervised Facial
  Representation
Pose-disentangled Contrastive Learning for Self-supervised Facial Representation
Y. Liu
Wenbin Wang
Yibing Zhan
Shaoze Feng
Li-Yu Daisy Liu
Zhe Chen
SSL
24
13
0
24 Nov 2022
A new Speech Feature Fusion method with cross gate parallel CNN for
  Speaker Recognition
A new Speech Feature Fusion method with cross gate parallel CNN for Speaker Recognition
Jiacheng Zhang
Wenyi Yan
Ye Zhang
20
2
0
24 Nov 2022
Complex-Valued Time-Frequency Self-Attention for Speech Dereverberation
Complex-Valued Time-Frequency Self-Attention for Speech Dereverberation
Vinay Kothapally
John H. L. Hansen
31
9
0
22 Nov 2022
Robust Training for Speaker Verification against Noisy Labels
Robust Training for Speaker Verification against Noisy Labels
Zhihua Fang
Liang He
Hanhan Ma
Xiao-Min Guo
Lin Li
NoLa
24
3
0
22 Nov 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for
  Speech Representation Learning
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
33
37
0
21 Nov 2022
Multi-source Domain Adaptation for Text-independent Forensic Speaker
  Recognition
Multi-source Domain Adaptation for Text-independent Forensic Speaker Recognition
Zhenyu Wang
John H. L. Hansen
36
21
0
17 Nov 2022
SPACE: Speech-driven Portrait Animation with Controllable Expression
SPACE: Speech-driven Portrait Animation with Controllable Expression
Francesco Ferroni
Arun Mallya
Ting-Chun Wang
Rafael Valle
Xuan Li
VGen
34
45
0
17 Nov 2022
Delivering Speaking Style in Low-resource Voice Conversion with
  Multi-factor Constraints
Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints
Zhichao Wang
Xinsheng Wang
Linfu Xie
Yuan-Jui Chen
Qiao Tian
Yuping Wang
25
5
0
16 Nov 2022
Array Configuration-Agnostic Personalized Speech Enhancement using
  Long-Short-Term Spatial Coherence
Array Configuration-Agnostic Personalized Speech Enhancement using Long-Short-Term Spatial Coherence
Yicheng Hsu
Yonghan Lee
M. Bai
27
2
0
16 Nov 2022
Multi-Label Training for Text-Independent Speaker Identification
Multi-Label Training for Text-Independent Speaker Identification
Yuqi Xue
27
0
0
14 Nov 2022
Towards A Unified Conformer Structure: from ASR to ASV Task
Towards A Unified Conformer Structure: from ASR to ASV Task
Dexin Liao
Tao Jiang
Feng Wang
Lin Li
Q. Hong
30
10
0
14 Nov 2022
Multi-Speaker and Wide-Band Simulated Conversations as Training Data for
  End-to-End Neural Diarization
Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization
Federico Landini
Mireia Díez
Alicia Lozano-Diez
L. Burget
37
15
0
12 Nov 2022
MARLIN: Masked Autoencoder for facial video Representation LearnINg
MARLIN: Masked Autoencoder for facial video Representation LearnINg
Zhixi Cai
Shreya Ghosh
Kalin Stefanov
Abhinav Dhall
Jianfei Cai
Hamid Rezatofighi
Reza Haffari
Munawar Hayat
ViT
CVBM
27
60
0
12 Nov 2022
Low Pass Filtering and Bandwidth Extension for Robust Anti-spoofing
  Countermeasure Against Codec Variabilities
Low Pass Filtering and Bandwidth Extension for Robust Anti-spoofing Countermeasure Against Codec Variabilities
Yikang Wang
Xingming Wang
Hiromitsu Nishizaki
Ming Li
24
6
0
12 Nov 2022
Speech separation with large-scale self-supervised learning
Speech separation with large-scale self-supervised learning
Zhuo Chen
Naoyuki Kanda
Jian Wu
Yu-Huan Wu
Xiaofei Wang
Takuya Yoshioka
Jinyu Li
S. Sivasankaran
Sefik Emre Eskimez
19
14
0
09 Nov 2022
BER: Balanced Error Rate For Speaker Diarization
BER: Balanced Error Rate For Speaker Diarization
Tao Liu
K. Yu
20
4
0
08 Nov 2022
Pushing the limits of self-supervised speaker verification using
  regularized distillation framework
Pushing the limits of self-supervised speaker verification using regularized distillation framework
Yafeng Chen
Siqi Zheng
Haibo Wang
Luyao Cheng
Qian Chen
20
24
0
08 Nov 2022
High-resolution embedding extractor for speaker diarisation
High-resolution embedding extractor for speaker diarisation
Hee-Soo Heo
Youngki Kwon
Bong-Jin Lee
You Jin Kim
Jee-weon Jung
32
5
0
08 Nov 2022
Dynamic Kernels and Channel Attention for Low Resource Speaker
  Verification
Dynamic Kernels and Channel Attention for Low Resource Speaker Verification
A. Ollerenshaw
Md. Asif Jalal
Thomas Hain
19
0
0
03 Nov 2022
Convolution channel separation and frequency sub-bands aggregation for
  music genre classification
Convolution channel separation and frequency sub-bands aggregation for music genre classification
Ju-Sung Heo
Hyun-Seo Shin
Ju-ho Kim
Chan-yeong Lim
Ha-Jin Yu
16
1
0
03 Nov 2022
Late Audio-Visual Fusion for In-The-Wild Speaker Diarization
Late Audio-Visual Fusion for In-The-Wild Speaker Diarization
Zexu Pan
Gordon Wichern
François Germain
Aswin Shanmugam Subramanian
Jonathan Le Roux
VGen
21
1
0
02 Nov 2022
Autoregressive GAN for Semantic Unconditional Head Motion Generation
Autoregressive GAN for Semantic Unconditional Head Motion Generation
Louis Airale
Xavier Alameda-Pineda
Stéphane Lathuilière
Dominique Vaufreydaz
25
3
0
02 Nov 2022
LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker
  Verification
LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification
Xingqi Chen
Jie Wang
Xiaoli Zhang
Weiqiang Zhang
Kunde Yang
AAML
26
7
0
02 Nov 2022
Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022
Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022
Zhengyang Chen
Bing Han
Xu Xiang
Houjun Huang
Bei Liu
Y. Qian
32
13
0
02 Nov 2022
Learning utterance-level representations through token-level acoustic
  latents prediction for Expressive Speech Synthesis
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Karolos Nikitaras
Konstantinos Klapsas
Nikolaos Ellinas
Georgia Maniati
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
14
0
0
01 Nov 2022
A Comparative Study on Multichannel Speaker-Attributed Automatic Speech
  Recognition in Multi-party Meetings
A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings
Mohan Shi
Jie Zhang
Zhihao Du
Fan Yu
Qian Chen
Shiliang Zhang
Lirong Dai
51
4
0
01 Nov 2022
Adapting self-supervised models to multi-talker speech recognition using
  speaker embeddings
Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
Zili Huang
Desh Raj
Leibny Paola García-Perera
Sanjeev Khudanpur
86
23
0
01 Nov 2022
Metric Learning for User-defined Keyword Spotting
Metric Learning for User-defined Keyword Spotting
Jaemin Jung
You-kyong. Kim
Jihwan Park
Youshin Lim
Byeong-Yeol Kim
Youngjoon Jang
Joon Son Chung
40
9
0
01 Nov 2022
Disentangled representation learning for multilingual speaker
  recognition
Disentangled representation learning for multilingual speaker recognition
Kihyun Nam
You-kyong. Kim
Jaesung Huh
Hee-Soo Heo
Jee-weon Jung
Joon Son Chung
53
6
0
01 Nov 2022
ImagineNET: Target Speaker Extraction with Intermittent Visual Cue
  through Embedding Inpainting
ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting
Zexu Pan
Wupeng Wang
Marvin Borsdorf
Haizhou Li
14
10
0
31 Oct 2022
Model Compression for DNN-based Speaker Verification Using Weight
  Quantization
Model Compression for DNN-based Speaker Verification Using Weight Quantization
Jingyu Li
W. Liu
Zhaoyang Zhang
Jiong Wang
Tan Lee
MQ
24
3
0
31 Oct 2022
Convolution-Based Channel-Frequency Attention for Text-Independent
  Speaker Verification
Convolution-Based Channel-Frequency Attention for Text-Independent Speaker Verification
Jingyu Li
Yusheng Tian
Tan Lee
30
9
0
31 Oct 2022
Combining Automatic Speaker Verification and Prosody Analysis for
  Synthetic Speech Detection
Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection
L. Attorresi
Davide Salvi
Clara Borrelli
Paolo Bestagini
Stefano Tubaro
18
22
0
31 Oct 2022
Target-Speaker Voice Activity Detection via Sequence-to-Sequence
  Prediction
Target-Speaker Voice Activity Detection via Sequence-to-Sequence Prediction
Ming Cheng
Weiqing Wang
Yucong Zhang
Xiaoyi Qin
Ming Li
VLM
56
32
0
28 Oct 2022
Parameter-efficient transfer learning of pre-trained Transformer models
  for speaker verification using adapters
Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters
Junyi Peng
Themos Stafylakis
Rongzhi Gu
Oldvrich Plchot
Ladislav Movsner
Lukávs Burget
JanHonza'' vCernocký
42
22
0
28 Oct 2022
Laugh Betrays You? Learning Robust Speaker Representation From Speech Containing Non-Verbal Fragments
Yuke Lin
Xiaoyi Qin
Huahua Cui
Zhenyi Zhu
Ming Li
16
1
0
28 Oct 2022
Previous
123...91011...141516
Next