v1v2 (latest)

VoxCeleb: a large-scale speaker identification dataset

26 June 2017

Arsha Nagrani

Joon Son Chung

Andrew Zisserman

ArXiv (abs)PDF HTML

Papers citing "VoxCeleb: a large-scale speaker identification dataset"

50 / 1,111 papers shown

Title
Configurable Privacy-Preserving Automatic Speech Recognition Ranya Aloufi Hamed Haddadi David E. Boyle 69 10 0 01 Apr 2021
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Adam Polyak Yossi Adi Jade Copet Eugene Kharitonov Kushal Lakhotia Wei-Ning Hsu Abdel-rahman Mohamed Emmanuel Dupoux 130 318 0 01 Apr 2021
Improved Meta-Learning Training for Speaker Verification Yafeng Chen Wu Guo Bin Gu 118 7 0 29 Mar 2021
Scalable and Efficient Neural Speech Coding: A Hybrid Design Kai Zhen Jongmo Sung Mi Suk Lee Seung-Wha Beack Minje Kim 95 14 0 27 Mar 2021
EfficientTDNN: Efficient Architecture Search for Speaker Recognition Rui Wang Zhihua Wei Haoran Duan S. Ji Yang Long Zhenhou Hong 138 18 0 25 Mar 2021
PriorityCut: Occlusion-guided Regularization for Warp-based Image Animation Wai Ting Cheung Gyeongsu Chae VGen 108 2 0 22 Mar 2021
USTC-NELSLIP System Description for DIHARD-III Challenge Yuxuan Wang Maokui He Shutong Niu Lei Sun Tian Gao Xin Fang Jia Pan Jun Du Chin-Hui Lee 76 30 0 19 Mar 2021
KoDF: A Large-scale Korean DeepFake Detection Dataset Patrick Kwon J. You Gyuhyeon Nam Sungwoo Park Gyeongsu Chae 109 104 0 18 Mar 2021
Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning Siyang Yuan Pengyu Cheng Ruiyi Zhang Weituo Hao Zhe Gan Lawrence Carin DRL 66 61 0 17 Mar 2021
Seeking the Shape of Sound: An Adaptive Framework for Learning Voice-Face Association Peisong Wen Qianqian Xu Yangbangyan Jiang Zhiyong Yang Yuan He Qingming Huang CVBM 61 33 0 12 Mar 2021
Learning spectro-temporal representations of complex sounds with parameterized neural networks Rachid Riad Julien Karadayi Anne-Catherine Bachoud-Lévi Emmanuel Dupoux 49 7 0 12 Mar 2021
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation Daisuke Niizumi Daiki Takeuchi Yasunori Ohishi Noboru Harada K. Kashino SSL 107 179 0 11 Mar 2021
EmoNet: A Transfer Learning Framework for Multi-Corpus Speech Emotion Recognition Maurice Gerczuk Shahin Amiriparian Sandra Ottl Björn Schuller 95 59 0 10 Mar 2021
Am I a Real or Fake Celebrity? Measuring Commercial Face Recognition Web APIs under Deepfake Impersonation Attack Shahroz Tariq Sowon Jeon Simon S. Woo 73 25 0 01 Mar 2021
Learnable MFCCs for Speaker Verification Xuechen Liu Md. Sahidullah Tomi Kinnunen 54 17 0 20 Feb 2021
AudioVisual Speech Synthesis: A brief literature review Efthymios Georgiou Athanasios Katsamanis 25 0 0 18 Feb 2021
Biometrics in the Era of COVID-19: Challenges and Opportunities M. Gomez-Barrero P. Drozdowski Christian Rathgeb J. Patino Massimiliano Todisco A. Nautsch Naser Damer Jannier Priesnitz Nicholas W. D. Evans Christoph Busch 79 54 0 18 Feb 2021
Adversarial defense for automatic speaker verification by cascaded self-supervised learning models Haibin Wu Xu Li Andy T. Liu Zhiyong Wu Helen Meng Hung-yi Lee AAML 86 41 0 14 Feb 2021
A Multi-View Approach To Audio-Visual Speaker Verification Leda Sari Kritika Singh Jiatong Zhou Lorenzo Torresani Nayan Singhal Yatharth Saraf 123 38 0 11 Feb 2021
ASVspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech A. Nautsch Xin Wang Nicholas W. D. Evans Tomi Kinnunen Ville Vestman Massimiliano Todisco Héctor Delgado Md. Sahidullah Junichi Yamagishi Kong Aik Lee 194 154 0 11 Feb 2021
Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning Giuseppe Ruggiero Enrico Zovato Luigi Di Caro V. Pollet DiffM 63 10 0 10 Feb 2021
The DKU-Duke-Lenovo System Description for the Third DIHARD Speech Diarization Challenge Weiqing Wang Qingjian Lin Danwei Cai Lin Yang Ming Li 35 8 0 06 Feb 2021
Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks Peter Wu Paul Pu Liang Jiatong Shi Ruslan Salakhutdinov Shinji Watanabe Louis-Philippe Morency 63 9 0 22 Jan 2021
LEAF: A Learnable Frontend for Audio Classification Neil Zeghidour O. Teboul Félix de Chaumont Quitry Marco Tagliasacchi VLM AAML 134 148 0 21 Jan 2021
MAAS: Multi-modal Assignation for Active Speaker Detection Juan Carlos León Alcázar Fabian Caba Heilbron Ali K. Thabet Guohao Li 130 52 0 11 Jan 2021
FakeBuster: A DeepFakes Detection Tool for Video Conferencing Scenarios V. Mehta Parul Gupta Ramanathan Subramanian Abhinav Dhall CVBM 67 22 0 09 Jan 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency Ruohan Gao Kristen Grauman CVBM 247 202 0 08 Jan 2021
What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure Jui Shah Yaman Kumar Singla Changyou Chen R. Shah 93 81 0 02 Jan 2021
Generalized Operating Procedure for Deep Learning: an Unconstrained Optimal Design Perspective Shen Chen Mingwei Zhang Jiamin Cui Wei Yao CVBM 49 0 0 31 Dec 2020
Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks Federico Landini Jan Profant Mireia Díez L. Burget 287 209 0 29 Dec 2020
A Principle Solution for Enroll-Test Mismatch in Speaker Recognition Lantian Li Dong Wang Jiawen Kang Renyu Wang Jingqian Wu Zhendong Gao Xiao Chen 65 7 0 23 Dec 2020
CN-Celeb: multi-genre speaker recognition Lantian Li Ruiqi Liu Jiawen Kang Yue Fan Hao Cui Yunqi Cai Ravichander Vipperla Tianshi Zheng Dong Wang 101 123 0 23 Dec 2020
Multi-stream Convolutional Neural Network with Frequency Selection for Robust Speaker Verification Wei Yao Shen Chen Jiamin Cui Yaolin Lou 76 6 0 21 Dec 2020
Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording Cong Han Yi Luo Chenda Li Tianyan Zhou K. Kinoshita ... Marc Delcroix Hakan Erdogan J. Hershey N. Mesgarani Zhuo Chen 58 8 0 17 Dec 2020
HeadGAN: One-shot Neural Head Synthesis and Editing M. Doukas Stefanos Zafeiriou V. Sharmanska CVBM 3DH 62 129 0 15 Dec 2020
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis Neeraj Kumar Srishti Goel Ankur Narang Brejesh Lall 68 5 0 14 Dec 2020
DEAAN: Disentangled Embedding and Adversarial Adaptation Network for Robust Speaker Representation Learning Mufan Sang Wei Xia John H. L. Hansen OOD DRL 94 23 0 12 Dec 2020
VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge Arsha Nagrani Joon Son Chung Jaesung Huh Andrew Brown Ernesto Coto Weidi Xie Mitchell McLaren D. Reynolds Andrew Zisserman 76 74 0 12 Dec 2020
Exploring wav2vec 2.0 on speaker verification and language identification Zhiyun Fan Meng Li Shiyu Zhou Bo Xu 159 203 0 11 Dec 2020
Adversarial Disentanglement of Speaker Representation for Attribute-Driven Privacy Preservation Paul-Gauthier Noé Mohammad MohammadAmini D. Matrouf Titouan Parcollet Andreas Nautsch J. Bonastre 100 28 0 08 Dec 2020
A Study of Few-Shot Audio Classification Piper Wolters Chris Careaga Brian Hutchinson Lauren A. Phillips 107 10 0 02 Dec 2020
Joint gender and age estimation based on speech signals using x-vectors and transfer learning Damian Kwaśny Daria Hemmerling 31 11 0 02 Dec 2020
A Unified Deep Speaker Embedding Framework for Mixed-Bandwidth Speech Data Weicheng Cai Ming Li 144 4 0 01 Dec 2020
Low Bandwidth Video-Chat Compression using Deep Generative Models Maxime Oquab Pierre Stock Oran Gafni Daniel Haziza Tao Xu ... Yana Hasson Patrick Labatut Bobo Bose-Kolanu T. Peyronel Camille Couprie 3DH 76 44 0 01 Dec 2020
Look who's not talking Youngki Kwon Hee-Soo Heo Jaesung Huh Bong-Jin Lee Joon Son Chung 36 29 0 30 Nov 2020
How Far Are We from Robust Voice Conversion: A Survey Tzu-hsien Huang Jheng-hao Lin Chien-yu Huang Hung-yi Lee 96 25 0 24 Nov 2020
Exploring Voice Conversion based Data Augmentation in Text-Dependent Speaker Verification Xiaoyi Qin Yaogen Yang Lin Yang Xuyang Wang Junjie Wang Ming Li 49 0 0 21 Nov 2020
FoolHD: Fooling speaker identification by Highly imperceptible adversarial Disturbances Ali Shahin Shamsabadi Francisco Teixeira A. Abad Bhiksha Raj Andrea Cavallaro Isabel Trancoso AAML 62 30 0 17 Nov 2020
Image Animation with Perturbed Masks Yoav Shalev Lior Wolf DiffM VGen 19 8 0 13 Nov 2020
Supervised attention for speaker recognition Seong Min Kye Joon Son Chung Hoirin Kim 94 11 0 10 Nov 2020