ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.05879
  4. Cited By
AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

14 May 2019
Kaizhi Qian
Yang Zhang
Shiyu Chang
Xuesong Yang
M. Hasegawa-Johnson
ArXivPDFHTML

Papers citing "AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss"

50 / 105 papers shown
Title
End-to-End Zero-Shot Voice Conversion with Location-Variable
  Convolutions
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions
Wonjune Kang
M. Hasegawa-Johnson
D. Roy
37
8
0
19 May 2022
Dictionary Attacks on Speaker Verification
Dictionary Attacks on Speaker Verification
Mirko Marras
Pawel Korus
Anubhav Jain
N. Memon
AAML
34
9
0
24 Apr 2022
ContentVec: An Improved Self-Supervised Speech Representation by
  Disentangling Speakers
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers
Kaizhi Qian
Yang Zhang
Heting Gao
Junrui Ni
Cheng-I Jeff Lai
David D. Cox
M. Hasegawa-Johnson
Shiyu Chang
DRL
30
110
0
20 Apr 2022
Enhanced exemplar autoencoder with cycle consistency loss in any-to-one
  voice conversion
Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion
Weida Liang
Lantian Li
Wenqiang Du
Dong Wang
56
0
0
08 Apr 2022
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement
  by Re-Synthesis
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Karren D. Yang
Dejan Marković
Steven Krenn
Vasu Agrawal
Alexander Richard
VGen
18
32
0
31 Mar 2022
HiFi-VC: High Quality ASR-Based Voice Conversion
HiFi-VC: High Quality ASR-Based Voice Conversion
A. Kashkin
I. Karpukhin
S. Shishkin
29
5
0
31 Mar 2022
Text-free non-parallel many-to-many voice conversion using normalising
  flows
Text-free non-parallel many-to-many voice conversion using normalising flows
Thomas Merritt
Abdelhamid Ezzerg
Piotr Bilinski
Magdalena Proszewska
Kamil Pokora
Roberto Barra-Chicote
Daniel Korzekwa
36
14
0
15 Mar 2022
Learning the Beauty in Songs: Neural Singing Voice Beautifier
Learning the Beauty in Songs: Neural Singing Voice Beautifier
Jinglin Liu
Chengxi Li
Yi Ren
Zhiying Zhu
Zhou Zhao
DiffM
35
16
0
27 Feb 2022
The HCCL-DKU system for fake audio generation task of the 2022 ICASSP
  ADD Challenge
The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge
Ziyi Chen
Hua Hua
Yuxiang Zhang
Ming Li
Pengyuan Zhang
27
0
0
29 Jan 2022
Noise-robust voice conversion with domain adversarial training
Noise-robust voice conversion with domain adversarial training
Hongqiang Du
Lei Xie
Haizhou Li
19
11
0
26 Jan 2022
Invertible Voice Conversion
Invertible Voice Conversion
Zexin Cai
Ming Li
BDL
38
1
0
26 Jan 2022
DFA-NeRF: Personalized Talking Head Generation via Disentangled Face
  Attributes Neural Rendering
DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering
Shunyu Yao
Ruizhe Zhong
Yichao Yan
Guangtao Zhai
Xiaokang Yang
CVBM
32
90
0
03 Jan 2022
IQDUBBING: Prosody modeling based on discrete self-supervised speech
  representation for expressive voice conversion
IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion
Wendong Gan
Bolong Wen
Yin Yan
Haitao Chen
Zhichao Wang
Hongqiang Du
Lei Xie
Kaixuan Guo
Hai Li
15
14
0
02 Jan 2022
Training Robust Zero-Shot Voice Conversion Models with Self-supervised
  Features
Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features
Trung D. Q. Dang
Dung T. Tran
Peter Chin
K. Koishida
SSL
19
15
0
08 Dec 2021
Zero-shot Singing Technique Conversion
Zero-shot Singing Technique Conversion
Brendan O'Connor
S. Dixon
Georgy Fazekas
35
5
0
16 Nov 2021
AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice
  Conversion
AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion
Damien Ronssin
Milos Cernak
20
10
0
12 Nov 2021
SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System
  for Both Human Beings and Machines
SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System for Both Human Beings and Machines
Haozhe Zhang
Zexin Cai
Xiaoyi Qin
Ming Li
54
15
0
06 Nov 2021
A Comparison of Discrete and Soft Speech Units for Improved Voice
  Conversion
A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
Benjamin van Niekerk
M. Carbonneau
Julian Zaïdi
Matthew Baas
Hugo Seuté
Herman Kamper
DRL
27
111
0
03 Nov 2021
Neural Analysis and Synthesis: Reconstructing Speech from
  Self-Supervised Representations
Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations
Hyeong-Seok Choi
Juheon Lee
W. Kim
Jie Hwan Lee
Hoon Heo
Kyogu Lee
37
151
0
27 Oct 2021
Zero-shot Voice Conversion via Self-supervised Prosody Representation
  Learning
Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning
Shijun Wang
Dimche Kostadinov
Damian Borth
29
11
0
27 Oct 2021
Disentanglement of Emotional Style and Speaker Identity for Expressive
  Voice Conversion
Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion
Zongyang Du
Berrak Sisman
Kun Zhou
Haizhou Li
18
24
0
20 Oct 2021
Toward Degradation-Robust Voice Conversion
Toward Degradation-Robust Voice Conversion
Chien-yu Huang
Kai-Wei Chang
Hung-yi Lee
38
7
0
14 Oct 2021
Voice Reenactment with F0 and timing constraints and adversarial
  learning of conversions
Voice Reenactment with F0 and timing constraints and adversarial learning of conversions
F. Bous
L. Benaroya
Nicolas Obin
Axel Roebel
19
2
0
07 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative
  Sequence Models
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models
Jen-Hao Rick Chang
A. Shrivastava
H. Koppula
Xiaoshuai Zhang
Oncel Tuzel
DiffM
51
16
0
06 Oct 2021
Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation
Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation
Yuanxun Lu
Jinxiang Chai
Xun Cao
29
82
0
22 Sep 2021
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the
  Real World
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World
Emily Wenger
Max Bronckers
Christian Cianfarani
Jenna Cryan
Angela Sha
Haitao Zheng
Ben Y. Zhao
AAML
40
39
0
20 Sep 2021
Timbre Transfer with Variational Auto Encoding and Cycle-Consistent
  Adversarial Networks
Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks
Russell Sammut Bonnici
C. Saitis
Martin Benning
GAN
36
15
0
05 Sep 2021
StarGAN-VC+ASR: StarGAN-based Non-Parallel Voice Conversion Regularized
  by Automatic Speech Recognition
StarGAN-VC+ASR: StarGAN-based Non-Parallel Voice Conversion Regularized by Automatic Speech Recognition
Shoki Sakamoto
Akira Taniguchi
T. Taniguchi
Hirokazu Kameoka
BDL
31
5
0
10 Aug 2021
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for
  Natural-Sounding Voice Conversion
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
Yinghao Aaron Li
A. Zare
N. Mesgarani
35
99
0
21 Jul 2021
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised
  Speech Representation Disentanglement for One-shot Voice Conversion
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion
Disong Wang
Liqun Deng
Y. Yeung
Xiao Chen
Xunying Liu
Helen Meng
DRL
22
136
0
18 Jun 2021
Review of end-to-end speech synthesis technology based on deep learning
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
26
24
0
20 Apr 2021
S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised
  Pretrained Representations
S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations
Jheng-hao Lin
Yist Y. Lin
C. Chien
Hung-yi Lee
30
56
0
07 Apr 2021
Lightweight and interpretable neural modeling of an audio distortion
  effect using hyperconditioned differentiable biquads
Lightweight and interpretable neural modeling of an audio distortion effect using hyperconditioned differentiable biquads
S. Nercessian
Andy M. Sarroff
K. Werner
17
29
0
15 Mar 2021
MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in
  Frames
MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Frames
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Nobukatsu Hojo
38
57
0
25 Feb 2021
Accent and Speaker Disentanglement in Many-to-many Voice Conversion
Accent and Speaker Disentanglement in Many-to-many Voice Conversion
Zhichao Wang
Wenshuo Ge
Xiong Wang
Shan Yang
Wendong Gan
Haitao Chen
Hai Li
Lei Xie
Xiulin Li
CVBM
36
32
0
17 Nov 2020
Optimizing voice conversion network with cycle consistency loss of
  speaker identity
Optimizing voice conversion network with cycle consistency loss of speaker identity
Hongqiang Du
Xiaohai Tian
Lei Xie
Haizhou Li
21
17
0
17 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech
  Synthesis via Phone-Level Content-Style Disentanglement
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement
Daxin Tan
Tan Lee
29
21
0
08 Nov 2020
Semi-supervised Learning for Singing Synthesis Timbre
Semi-supervised Learning for Singing Synthesis Timbre
J. Bonada
Merlijn Blaauw
27
4
0
05 Nov 2020
AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and
  Adaptive Instance Normalization
AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization
Yen-Hao Chen
Da-Yi Wu
Tsung-Han Wu
Hung-yi Lee
34
107
0
31 Oct 2020
PPG-based singing voice conversion with adversarial representation
  learning
PPG-based singing voice conversion with adversarial representation learning
Zhonghao Li
Benlai Tang
Xiang Yin
Yuan Wan
Linjia Xu
Chen Shen
Zejun Ma
19
37
0
28 Oct 2020
CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram
  Conversion
CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Nobukatsu Hojo
29
78
0
22 Oct 2020
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge
  2020: Cascading ASR and TTS
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS
Wen-Chin Huang
Tomoki Hayashi
Shinji Watanabe
T. Toda
DRL
15
39
0
06 Oct 2020
Voice Conversion Challenge 2020: Intra-lingual semi-parallel and
  cross-lingual voice conversion
Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion
Yi Zhao
Wen-Chin Huang
Xiaohai Tian
Junichi Yamagishi
Rohan Kumar Das
Tomi Kinnunen
Zhenhua Ling
T. Toda
27
206
0
28 Aug 2020
VQVC+: One-Shot Voice Conversion by Vector Quantization and U-Net
  architecture
VQVC+: One-Shot Voice Conversion by Vector Quantization and U-Net architecture
Da-Yi Wu
Yen-Hao Chen
Hung-yi Lee
10
99
0
07 Jun 2020
Speech-to-Singing Conversion based on Boundary Equilibrium GAN
Speech-to-Singing Conversion based on Boundary Equilibrium GAN
Da-Yi Wu
Yi-Hsuan Yang
GAN
14
8
0
28 May 2020
Contrastive Predictive Coding Supported Factorized Variational
  Autoencoder for Unsupervised Learning of Disentangled Speech Representations
Contrastive Predictive Coding Supported Factorized Variational Autoencoder for Unsupervised Learning of Disentangled Speech Representations
Janek Ebbers
Michael Kuhlmann
Tobias Cord-Landwehr
Reinhold Haeb-Umbach
DRL
CoGe
SSL
31
4
0
26 May 2020
Many-to-Many Voice Transformer Network
Many-to-Many Voice Transformer Network
Hirokazu Kameoka
Wen-Chin Huang
Kou Tanaka
Takuhiro Kaneko
Nobukatsu Hojo
T. Toda
ViT
30
30
0
18 May 2020
Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice
  Conversion
Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion
Kun Zhou
Berrak Sisman
Mingyang Zhang
Haizhou Li
30
52
0
13 May 2020
Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice
  Conversion without Parallel Data
Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data
Seung-won Park
Doo-young Kim
Myun-chul Joe
26
40
0
07 May 2020
Zero-Shot Learning and its Applications from Autonomous Vehicles to
  COVID-19 Diagnosis: A Review
Zero-Shot Learning and its Applications from Autonomous Vehicles to COVID-19 Diagnosis: A Review
Mahdi Rezaei
Mahsa Shahidi
29
53
0
29 Apr 2020
Previous
123
Next