VoxCeleb2: Deep Speaker Recognition

14 June 2018

Joon Son Chung

Papers citing "VoxCeleb2: Deep Speaker Recognition"

50 / 773 papers shown

Title
A comprehensive study on self-supervised distillation for speaker representation learning Zhengyang Chen Yao Qian Bing Han Y. Qian Michael Zeng SSL 39 17 0 28 Oct 2022
Speaker recognition with two-step multi-modal deep cleansing Ruijie Tao Kong Aik Lee Zhan Shi Haizhou Li NoLa 47 13 0 28 Oct 2022
Coverage-centric Coreset Selection for High Pruning Rates Haizhong Zheng Rui Liu Fan Lai Atul Prakash 33 53 0 28 Oct 2022
Toroidal Probabilistic Spherical Discriminant Analysis Anna Silnova Niko Brummer Albert Swart L. Burget 33 2 0 27 Oct 2022
Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs Ruijie Tao Kong Aik Lee Rohan Kumar Das Ville Hautamaki Haizhou Li SSL 29 8 0 27 Oct 2022
Privacy-preserving Automatic Speaker Diarization Francisco Teixeira A. Abad Bhiksha Raj Isabel Trancoso 27 4 0 26 Oct 2022
In search of strong embedding extractors for speaker diarisation Jee-weon Jung Hee-Soo Heo Bong-Jin Lee Jaesung Huh A. Brown Youngki Kwon Shinji Watanabe Joon Son Chung 44 16 0 26 Oct 2022
Spectral Clustering-aware Learning of Embeddings for Speaker Diarisation Evonne Lee Guangzhi Sun C. Zhang P. Woodland 27 1 0 24 Oct 2022
Quantitative Evidence on Overlooked Aspects of Enrollment Speaker Embeddings for Target Speaker Separation Xiaoyu Liu Xu Li Joan Serrà 44 9 0 23 Oct 2022
Low-Resource Multilingual and Zero-Shot Multispeaker TTS Florian Lux Julia Koch Ngoc Thang Vu 38 22 0 21 Oct 2022
Large-scale learning of generalised representations for speaker recognition Jee-weon Jung Hee-Soo Heo Bong-Jin Lee Jaesong Lee Hye-jin Shim Youngki Kwon Joon Son Chung Shinji Watanabe CVBM 31 6 0 20 Oct 2022
How to Boost Face Recognition with StyleGAN? Artem Sevastopolsky Yury Malkov N. Durasov L. Verdoliva Matthias Nießner PICV 28 13 0 18 Oct 2022
Risk of re-identification for shared clinical speech recordings D. Wiepert B. Malin Joseph James Duffy Rene L. Utianski John L. Stricker David T. Jones Hugo Botha 40 0 0 18 Oct 2022
How to Leverage DNN-based speech enhancement for multi-channel speaker verification? Sandipana Dowerah Romain Serizel D. Jouvet Mohammad MohammadAmini D. Matrouf 34 0 0 17 Oct 2022
Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations Themos Stafylakis Ladislav Mošner Sofoklis Kakouros Oldrich Plchot L. Burget J. Černocký SSL 40 8 0 15 Oct 2022
Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy Sarina Meyer Pascal Tilli Pavel Denisov Florian Lux Julia Koch Ngoc Thang Vu 23 31 0 13 Oct 2022
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech Byoung Jin Choi Myeonghun Jeong Minchan Kim Sung Hwan Mun N. Kim DiffM 27 5 0 12 Oct 2022
Controllable Radiance Fields for Dynamic Face Synthesis Peiye Zhuang Liqian Ma Oluwasanmi Koyejo A. Schwing CVBM 3DH 18 11 0 11 Oct 2022
Revisiting Self-Supervised Contrastive Learning for Facial Expression Recognition Yuxuan Shu Xiao Gu Guangyao Yang Benny Lo SSL 54 17 0 08 Oct 2022
PSVRF: Learning to restore Pitch-Shifted Voice without reference Yangfu Li Xiaodan Lin Jiaxin Yang 19 0 0 06 Oct 2022
Geometry Driven Progressive Warping for One-Shot Face Animation Yatao Zhong F. Amjadi Ilya Zharkov 3DH CVBM 21 1 0 05 Oct 2022
Learning Video-independent Eye Contact Segmentation from In-the-Wild Videos Tianyi Wu Yusuke Sugano 14 0 0 05 Oct 2022
Voice Spoofing Countermeasures: Taxonomy, State-of-the-art, experimental analysis of generalizability, open challenges, and the way forward Awais Khan K. Malik James Ryan Mikul Saravanan AAML 48 11 0 02 Oct 2022
Deepfake audio detection by speaker verification Alessandro Pianese D. Cozzolino Giovanni Poggi L. Verdoliva 38 39 0 28 Sep 2022
StyleSwap: Style-Based Generator Empowers Robust Face Swapping Zhi-liang Xu Hang Zhou Zhibin Hong Ziwei Liu Jiaming Liu Zhizhi Guo Junyu Han Jingtuo Liu Errui Ding Jingdong Wang CVBM 39 77 0 27 Sep 2022
StyleMask: Disentangling the Style Space of StyleGAN2 for Neural Face Reenactment Stella Bounareli Christos Tzelepis Vasileios Argyriou Ioannis Patras Georgios Tzimiropoulos CVBM 27 17 0 27 Sep 2022
Unsupervised active speaker detection in media content using cross-modal information Rahul Sharma Shrikanth Narayanan 24 3 0 24 Sep 2022
The Kriston AI System for the VoxCeleb Speaker Recognition Challenge 2022 Qutang Cai Guoqiang Hong Zhijian Ye Ximin Li Haizhou Li 38 7 0 23 Sep 2022
The SpeakIn System Description for CNSRC2022 Yu Zheng Yihao Chen Jinghan Peng Yajun Zhang Min Liu Minqiang Xu 26 2 0 22 Sep 2022
Gemino: Practical and Robust Neural Compression for Video Conferencing Vibhaalakshmi Sivaraman Pantea Karimi Vedantha Venkatapathy Mehrdad Khani Shirkoohi Sadjad Fouladi M. Alizadeh F. Durand Vivienne Sze 3DH 44 17 0 21 Sep 2022
FNeVR: Neural Volume Rendering for Face Animation Bo-Wen Zeng Bo-Ye Liu Hong Li Xuhui Liu Jianzhuang Liu Dapeng Chen Wei Peng Baochang Zhang CVBM 3DH 48 26 0 21 Sep 2022
Relaxed Attention for Transformer Models Timo Lohrenz Björn Möller Zhengyang Li Tim Fingscheidt KELM 29 11 0 20 Sep 2022
SJTU-AISPEECH System for VoxCeleb Speaker Recognition Challenge 2022 Zhengyang Chen Bing Han Xu Xiang Houjun Huang Bei Liu Y. Qian 17 8 0 19 Sep 2022
AutoLV: Automatic Lecture Video Generator Wen Wang Yang Song Sanjay Jha VGen 18 3 0 19 Sep 2022
Pay Attention to Hard Trials Lantian Li Di Wang Dong Wang 48 1 0 10 Sep 2022
Learning Audio-Visual embedding for Person Verification in the Wild Peiwen Sun Shanshan Zhang Zishan Liu Yougen Yuan Tao Zhang Honggang Zhang Pengfei Hu 30 4 0 09 Sep 2022
IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languages Tahir Javed Kaushal Bhogale A. Raman Anoop Kunchukuttan Pratyush Kumar Mitesh M. Khapra ELM 30 20 0 24 Aug 2022
Learning in Audio-visual Context: A Review, Analysis, and New Perspective Yake Wei Di Hu Yapeng Tian Xuelong Li 46 55 0 20 Aug 2022
Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors Sindhu B. Hegde Rudrabha Mukhopadhyay Vinay P. Namboodiri C. V. Jawahar CVBM 16 1 0 17 Aug 2022
Disentangled Speaker Representation Learning via Mutual Information Minimization Sung Hwan Mun Mingrui Han Minchan Kim Dongjune Lee N. Kim DRL 41 9 0 17 Aug 2022
Style Your Hair: Latent Optimization for Pose-Invariant Hairstyle Transfer via Local-Style-Aware Hair Alignment Taewoo Kim Chaeyeon Chung Yoonseong Kim S. Park Kangyeol Kim Jaegul Choo 3DH 39 20 0 16 Aug 2022
Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from Speech Jaejin Cho Jesús Villalba Laureano Moro Velázquez Najim Dehak SSL 39 18 0 10 Aug 2022
Robust Acoustic Domain Identification with its Application to Speaker Diarization Kishore Kumar A Shefali Waldekar Md. Sahidullah G. Saha 24 0 0 05 Aug 2022
Video Manipulations Beyond Faces: A Dataset with Human-Machine Analysis Trisha Mittal Ritwik Sinha Viswanathan Swaminathan John Collomosse Tianyi Zhou 30 9 0 26 Jul 2022
Multimodal Emotion Recognition with Modality-Pairwise Unsupervised Contrastive Loss Riccardo Franceschini Enrico Fini Cigdem Beyan Alessandro Conti F. Arrigoni Elisa Ricci SSL OffRL 34 16 0 23 Jul 2022
Telepresence Video Quality Assessment Zhenqiang Ying Deepti Ghadiyaram A. Bovik 16 5 0 20 Jul 2022
Controllable Data Generation by Deep Learning: A Review Shiyu Wang Yuanqi Du Xiaojie Guo Bo Pan Zhaohui Qin Liang Zhao 33 28 0 19 Jul 2022
Multi-channel target speech enhancement based on ERB-scaled spatial coherence features Yicheng Hsu Yonghan Lee M. Bai 25 1 0 17 Jul 2022
MegaPortraits: One-shot Megapixel Neural Head Avatars Nikita Drobyshev Jenya Chelishev Taras Khakhulin Aleksei Ivakhnenko Victor Lempitsky Egor Zakharov 28 108 0 15 Jul 2022
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality Wei-Ning Hsu Bowen Shi SSL VLM 27 41 0 14 Jul 2022