The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods

12 April 2018

Papers citing "The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods"

47 / 47 papers shown

Title
FADEL: Uncertainty-aware Fake Audio Detection with Evidential Deep Learning Ju Yeon Kang J. Yoon Semin Kim Min Hyun Han Nam Soo Kim 32 0 0 22 Apr 2025
Harder or Different? Understanding Generalization of Audio Deepfake Detection Nicolas M. Muller Nicholas W. D. Evans Hemlata Tak Philip Sperl Konstantin Böttinger 29 3 0 05 Jun 2024
Audio Anti-Spoofing Detection: A Survey Menglu Li Yasaman Ahmadiadli Xiao-Ping Zhang 48 17 0 22 Apr 2024
Explainable Deepfake Video Detection using Convolutional Neural Network and CapsuleNet Gazi Hasin Ishrak Zalish Mahmud Md. Zami Al Zunaed Farabe Tahera Khanom Tinni Tanzim Reza Mohammad Zavid Parvez 48 3 0 19 Apr 2024
Multi-Dataset Co-Training with Sharpness-Aware Optimization for Audio Anti-spoofing Hye-jin Shim Jee-weon Jung Tomi Kinnunen 21 13 0 31 May 2023
Voice conversion with limited data and limitless data augmentations Olga Slizovskaia Jordi Janer Pritish Chandna Oscar Mayor 30 1 0 27 Dec 2022
RedPen: Region- and Reason-Annotated Dataset of Unnatural Speech Kyumin Park Keon Lee Daeyoung Kim Dongyeop Kang 26 0 0 26 Oct 2022
ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild Xuechen Liu Xin Wang Md. Sahidullah J. Patino Héctor Delgado ... Massimiliano Todisco Junichi Yamagishi Nicholas W. D. Evans A. Nautsch Kong Aik Lee 40 173 0 05 Oct 2022
Controllable Accented Text-to-Speech Synthesis Rui Liu Berrak Sisman Guanglai Gao Haizhou Li 34 6 0 22 Sep 2022
Deepfake: Definitions, Performance Metrics and Standards, Datasets and Benchmarks, and a Meta-Review Enes ALTUNCU V. N. Franqueira Shujun Li 28 11 0 21 Aug 2022
Comparison of Speech Representations for the MOS Prediction System A. Kunikoshi Jaebok Kim Won-Suk Jun K. Sjölander 13 1 0 28 Jun 2022
Joint Optimization of Sampling Rate Offsets Based on Entire Signal Relationship Among Distributed Microphones Yoshiki Masuyama K. Yamaoka Nobutaka Ono 28 5 0 27 Jun 2022
Fusion of Self-supervised Learned Models for MOS Prediction Zhengdong Yang Wangjin Zhou Chenhui Chu Sheng Li Raj Dabre Raphaël Rubino Yi Zhao 25 28 0 11 Apr 2022
HiFi++: a Unified Framework for Bandwidth Extension and Speech Enhancement Pavel Andreev Aibek Alanov Oleg Ivanov Dmitry Vetrov 38 38 0 24 Mar 2022
NORESQA: A Framework for Speech Quality Assessment using Non-Matching References Pranay Manocha Buye Xu Anurag Kumar 35 44 0 16 Sep 2021
SVSNet: An End-to-end Speaker Voice Similarity Assessment Model Cheng-Hung Hu Yu-Huai Peng Junichi Yamagishi Yu Tsao Hsin-Min Wang 29 5 0 20 Jul 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 18 352 0 29 Jun 2021
Speech is Silver, Silence is Golden: What do ASVspoof-trained Models Really Learn? Nicolas M. Muller Franziska Dieckmann Pavel Czempin Roman Canals Konstantin Böttinger Jennifer Williams 35 70 0 23 Jun 2021
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model Chenye Cui Yi Ren Jinglin Liu Feiyang Chen Rongjie Huang Ming Lei Zhou Zhao 24 35 0 17 Jun 2021
Emotional Voice Conversion: Theory, Databases and ESD Kun Zhou Berrak Sisman Rui Liu Haizhou Li 25 168 0 31 May 2021
Deep Learning Based Assessment of Synthetic Speech Naturalness Gabriel Mittag Sebastian Möller 20 62 0 23 Apr 2021
MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Frames Takuhiro Kaneko Hirokazu Kameoka Kou Tanaka Nobukatsu Hojo 33 57 0 25 Feb 2021
CDPAM: Contrastive learning for perceptual audio similarity Pranay Manocha Zeyu Jin Richard Y. Zhang Adam Finkelstein 27 68 0 09 Feb 2021
Optimizing voice conversion network with cycle consistency loss of speaker identity Hongqiang Du Xiaohai Tian Lei Xie Haizhou Li 21 17 0 17 Nov 2020
Low-resource expressive text-to-speech using data augmentation Goeric Huybrechts Thomas Merritt Giulia Comini Bartek Perz Raahil Shah Jaime Lorenzo-Trueba 26 50 0 11 Nov 2020
CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion Takuhiro Kaneko Hirokazu Kameoka Kou Tanaka Nobukatsu Hojo 26 78 0 22 Oct 2020
Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions Rohan Kumar Das Tomi Kinnunen Wen-Chin Huang Zhenhua Ling Junichi Yamagishi Yi Zhao Xiaohai Tian T. Toda 31 52 0 08 Sep 2020
Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling Yeunju Choi Youngmoon Jung Hoirin Kim 14 26 0 09 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning Berrak Sisman Junichi Yamagishi Simon King Haizhou Li BDL 41 317 0 09 Aug 2020
Pretraining Techniques for Sequence-to-Sequence Voice Conversion Wen-Chin Huang Tomoki Hayashi Yi-Chiao Wu Hirokazu Kameoka T. Toda 27 38 0 07 Aug 2020
Neural MOS Prediction for Synthesized Speech Using Multi-Task Learning With Spoofing Detection and Spoofing Type Classification Yeunju Choi Youngmoon Jung Hoirin Kim 16 27 0 16 Jul 2020
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network Yi-Chiao Wu Tomoki Hayashi Patrick Lumban Tobing Kazuhiro Kobayashi T. Toda 27 18 0 11 Jul 2020
DeepSonar: Towards Effective and Robust Detection of AI-Synthesized Fake Voices Run Wang Felix Juefei Xu Yihao Huang Qing Guo Xiaofei Xie Lei Ma Yang Liu AAML 25 105 0 28 May 2020
Quasi-Periodic Parallel WaveGAN Vocoder: A Non-autoregressive Pitch-dependent Dilated Convolution Model for Parametric Speech Generation Yi-Chiao Wu Tomoki Hayashi T. Okamoto Hisashi Kawai T. Toda 29 4 0 18 May 2020
Many-to-Many Voice Transformer Network Hirokazu Kameoka Wen-Chin Huang Kou Tanaka Takuhiro Kaneko Nobukatsu Hojo T. Toda ViT 30 30 0 18 May 2020
Introducing the VoicePrivacy Initiative N. Tomashenko B. M. L. Srivastava Xin Wang Emmanuel Vincent A. Nautsch ... Nicholas W. D. Evans J. Patino J. Bonastre Paul-Gauthier Noé Massimiliano Todisco 40 127 0 04 May 2020
The Attacker's Perspective on Automatic Speaker Verification: An Overview Rohan Kumar Das Xiaohai Tian Tomi Kinnunen Haizhou Li AAML 20 81 0 19 Apr 2020
SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation Arya D. McCarthy Liezl Puzon J. Pino 31 24 0 27 Feb 2020
Many-to-Many Voice Conversion using Conditional Cycle-Consistent Adversarial Networks Shindong Lee Bonggu Ko Keonnyeong Lee In-Chul Yoo Dongsuk Yook GAN 30 33 0 15 Feb 2020
Unsupervised Representation Disentanglement using Cross Domain Features and Adversarial Learning in Variational Autoencoder based Voice Conversion Wen-Chin Huang Hao Luo Hsin-Te Hwang Chen-Chou Lo Yu-Huai Peng Yu Tsao Hsin-Min Wang DRL 17 42 0 22 Jan 2020
Multi-task Learning For Detecting and Segmenting Manipulated Facial Images and Videos H. Nguyen Fuming Fang Junichi Yamagishi Isao Echizen AAML CVBM 26 424 0 17 Jun 2019
Voice Mimicry Attacks Assisted by Automatic Speaker Verification Ville Vestman Tomi Kinnunen Rosa González Hautamäki Md. Sahidullah 34 37 0 03 Jun 2019
TTS Skins: Speaker Conversion via ASR Adam Polyak Lior Wolf Yaniv Taigman 18 27 0 18 Apr 2019
Generalized Multichannel Variational Autoencoder for Underdetermined Source Separation Shogo Seki Hirokazu Kameoka Li Li T. Toda K. Takeda DRL 14 19 0 29 Sep 2018
ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder Hirokazu Kameoka Takuhiro Kaneko Kou Tanaka Nobukatsu Hojo DRL 16 59 0 13 Aug 2018
StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks Hirokazu Kameoka Takuhiro Kaneko Kou Tanaka Nobukatsu Hojo 34 370 0 06 Jun 2018
Collapsed speech segment detection and suppression for WaveNet vocoder Yi-Chiao Wu Kazuhiro Kobayashi Tomoki Hayashi Patrick Lumban Tobing T. Toda 7 25 0 30 Apr 2018