ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.10394
  4. Cited By
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for
  Natural-Sounding Voice Conversion

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

21 July 2021
Yinghao Aaron Li
A. Zare
N. Mesgarani
ArXivPDFHTML

Papers citing "StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion"

50 / 60 papers shown
Title
Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements
Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements
Sandipan Dhar
N. D. Jana
Swagatam Das
48
0
0
27 Apr 2025
kNN-SVC: Robust Zero-Shot Singing Voice Conversion with Additive Synthesis and Concatenation Smoothness Optimization
kNN-SVC: Robust Zero-Shot Singing Voice Conversion with Additive Synthesis and Concatenation Smoothness Optimization
Keren Shao
K. Chen
Matthew Baas
Shlomo Dubnov
23
0
0
08 Apr 2025
AVENet: Disentangling Features by Approximating Average Features for Voice Conversion
AVENet: Disentangling Features by Approximating Average Features for Voice Conversion
Wenyu Wang
Yiquan Zhou
Jihua Zhu
Hongwu Ding
Jiacheng Xu
Shihao Li
DRL
32
0
0
08 Apr 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
61
0
0
11 Mar 2025
Emotion Recognition and Generation: A Comprehensive Review of Face, Speech, and Text Modalities
Emotion Recognition and Generation: A Comprehensive Review of Face, Speech, and Text Modalities
Rebecca Mobbs
Dimitrios Makris
Vasileios Argyriou
43
0
0
02 Feb 2025
EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion
EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion
Ashishkumar Gudmalwar
Ishan D. Biyani
Nirmesh J. Shah
Pankaj Wasnik
R. Shah
DiffM
26
0
0
31 Dec 2024
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
O. Mutlu
Ataberk Olgun
Geraldo F. Oliveira
Ismail Emir Yüksel
44
3
0
26 Dec 2024
Optimal Transport Maps are Good Voice Converters
Optimal Transport Maps are Good Voice Converters
Arip Asadulaev
Rostislav Korst
V. Shutov
Alexander Korotin
Yaroslav Grebnyak
Vahe Egiazarian
E. Burnaev
OT
40
1
0
17 Oct 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
56
3
0
23 Sep 2024
Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition
Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition
Chien-Chun Wang
Li-Wei Chen
Cheng-Kang Chou
Hung-Shin Lee
Berlin Chen
Hsin-Min Wang
26
0
0
19 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis
  with Distilled Time-Varying Style Diffusion
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
29
5
0
16 Sep 2024
VoiceWukong: Benchmarking Deepfake Voice Detection
VoiceWukong: Benchmarking Deepfake Voice Detection
Ziwei Yan
Yanjie Zhao
Haoyu Wang
40
1
0
10 Sep 2024
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with
  Adversarial Conditional Diffusion Distillation
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Yuto Kondo
DiffM
40
0
0
03 Sep 2024
Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure
  Transformer Blocks and Triplet Discriminative Training
Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure Transformer Blocks and Triplet Discriminative Training
Wenhan Yao
Zedong Xing
Xiarun Chen
Jia Liu
yongqiang He
Weiping Wen
21
0
0
03 Sep 2024
Effective Noise-aware Data Simulation for Domain-adaptive Speech
  Enhancement Leveraging Dynamic Stochastic Perturbation
Effective Noise-aware Data Simulation for Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation
Chien-Chun Wang
Li-Wei Chen
Hung-Shin Lee
Berlin Chen
Hsin-Min Wang
32
1
0
03 Sep 2024
Easy, Interpretable, Effective: openSMILE for voice deepfake detection
Easy, Interpretable, Effective: openSMILE for voice deepfake detection
Octavian Pascu
Dan Oneaţă
H. Cucu
Nicolas M. Muller
48
1
0
28 Aug 2024
BUT Systems and Analyses for the ASVspoof 5 Challenge
BUT Systems and Analyses for the ASVspoof 5 Challenge
Johan Rohdin
Lin Zhang
Oldřich Plchot
Vojtěch Staněk
David Mihola
...
Themos Stafylakis
Dmitriy Beveraki
Anna Silnova
Jan Brukner
Lukáš Burget
44
1
0
20 Aug 2024
Disentangling segmental and prosodic factors to non-native speech
  comprehensibility
Disentangling segmental and prosodic factors to non-native speech comprehensibility
Waris Quamer
Ricardo Gutierrez-Osuna
37
1
0
20 Aug 2024
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks
  at Scale
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
Xin Wang
Héctor Delgado
Hemlata Tak
Jee-weon Jung
Hye-jin Shim
...
Md. Sahidullah
Tomi Kinnunen
Nicholas W. D. Evans
K. Lee
Junichi Yamagishi
AAML
45
38
0
16 Aug 2024
Style-Talker: Finetuning Audio Language Model and Style-Based
  Text-to-Speech Model for Fast Spoken Dialogue Generation
Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
Yinghao Aaron Li
Xilin Jiang
Jordan Darefsky
Ge Zhu
N. Mesgarani
41
2
0
13 Aug 2024
Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation
  Using GANs and Integrated Unaligned Clean Data
Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data
Yu-Hua Chen
Woosung Choi
Wei-Hsiang Liao
Marco A. Martínez-Ramírez
K. Cheuk
Yuki Mitsufuji
J. Jang
Yi-Hsuan Yang
50
5
0
22 Jun 2024
SilentCipher: Deep Audio Watermarking
SilentCipher: Deep Audio Watermarking
Mayank Kumar Singh
Naoya Takahashi
Weihsiang Liao
Yuki Mitsufuji
43
7
0
06 Jun 2024
MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot
  Voice Conversion
MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
Pengcheng Li
Jianzong Wang
Xulong Zhang
Yong Zhang
Jing Xiao
Ning Cheng
DRL
41
1
0
02 May 2024
AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
Govind Mittal
Arthur Jakobsson
Kelly O. Marshall
Chinmay Hegde
Nasir D. Memon
37
0
0
28 Feb 2024
MLAAD: The Multi-Language Audio Anti-Spoofing Dataset
MLAAD: The Multi-Language Audio Anti-Spoofing Dataset
Nicolas M. Muller
Piotr Kawa
Wei Herng Choong
Edresson Casanova
Eren Golge
Thorsten Muller
P. Syga
Philip Sperl
Konstantin Böttinger
42
35
0
17 Jan 2024
Attention-based Interactive Disentangling Network for Instance-level
  Emotional Voice Conversion
Attention-based Interactive Disentangling Network for Instance-level Emotional Voice Conversion
Yun Chen
Lingxiao Yang
Qi Chen
Jianhuang Lai
Xiaohua Xie
29
3
0
29 Dec 2023
Towards General-Purpose Text-Instruction-Guided Voice Conversion
Towards General-Purpose Text-Instruction-Guided Voice Conversion
Chun-Yi Kuan
Chen An Li
Tsung-Yuan Hsu
T. Lin
Ho-Lam Chung
Kai-Wei Chang
Shuo-yiin Chang
Hung-yi Lee
18
5
0
25 Sep 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise
  Filter and Inverse Short Time Fourier Transform
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
35
4
0
18 Sep 2023
Improving Voice Conversion for Dissimilar Speakers Using Perceptual
  Losses
Improving Voice Conversion for Dissimilar Speakers Using Perceptual Losses
Suhita Ghosh
Yamini Sinha
Ingo Siegert
Sebastian Stober
11
1
0
15 Sep 2023
StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep
  Embeddings
StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings
Arnab Das
Suhita Ghosh
Tim Polzehl
Sebastian Stober
30
4
0
14 Sep 2023
Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel
  Emotion-Preserving Voice Conversion
Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion
Suhita Ghosh
Arnab Das
Yamini Sinha
Ingo Siegert
Tim Polzehl
Sebastian Stober
22
4
0
14 Sep 2023
SLMGAN: Exploiting Speech Language Model Representations for
  Unsupervised Zero-Shot Voice Conversion in GANs
SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
Yinghao Aaron Li
Cong Han
N. Mesgarani
28
5
0
18 Jul 2023
Towards Stealthy Backdoor Attacks against Speech Recognition via
  Elements of Sound
Towards Stealthy Backdoor Attacks against Speech Recognition via Elements of Sound
Hanbo Cai
Pengcheng Zhang
Hai Dong
Yan Xiao
Stefanos Koffas
Yiming Li
AAML
29
28
0
17 Jul 2023
The Singing Voice Conversion Challenge 2023
The Singing Voice Conversion Challenge 2023
Wen-Chin Huang
Lester Phillip Violeta
Songxiang Liu
Jiatong Shi
T. Toda
16
46
0
26 Jun 2023
Automatic Speech Disentanglement for Voice Conversion using Rank Module
  and Speech Augmentation
Automatic Speech Disentanglement for Voice Conversion using Rank Module and Speech Augmentation
Zhonghua Liu
Shijun Wang
Ning Chen
DRL
22
2
0
21 Jun 2023
HumanDiffusion: diffusion model using perceptual gradients
HumanDiffusion: diffusion model using perceptual gradients
Yota Ueda
Shinnosuke Takamichi
Yuki Saito
Norihiro Takamune
Hiroshi Saruwatari
DiffM
16
0
0
21 Jun 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion
  and Adversarial Training with Large Speech Language Models
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLM
DiffM
37
107
0
13 Jun 2023
Discussion Paper: The Threat of Real Time Deepfakes
Discussion Paper: The Threat of Real Time Deepfakes
Guy Frankovits
Yisroel Mirsky
11
5
0
04 Jun 2023
Iteratively Improving Speech Recognition and Voice Conversion
Iteratively Improving Speech Recognition and Voice Conversion
Mayank Singh
Naoya Takahashi
Ono Naoyuki
13
4
0
24 May 2023
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge
  Distillation and Hybrid Predictive Coding
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding
Ziqian Ning
Yuepeng Jiang
Pengcheng Zhu
Jixun Yao
Shuai Wang
Linfu Xie
Mengxiao Bi
34
10
0
21 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by
  Unsupervised Learning from Voice Recordings
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
34
1
0
09 May 2023
Cross-modal Face- and Voice-style Transfer
Cross-modal Face- and Voice-style Transfer
Naoya Takahashi
M. Singh
Yuki Mitsufuji
CVBM
56
2
0
27 Feb 2023
Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs
  Using Dual Domain Adversarial Network & Virtual Domain Pairing
Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing
Nirmesh J. Shah
M. Singh
Naoya Takahashi
N. Onoe
49
13
0
21 Feb 2023
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme
  Predictions
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
25
22
0
20 Jan 2023
Deepfake CAPTCHA: A Method for Preventing Fake Calls
Deepfake CAPTCHA: A Method for Preventing Fake Calls
Lior Yasur
Guy Frankovits
Fred M. Grabovski
Yisroel Mirsky
33
11
0
08 Jan 2023
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from
  Style-Based TTS Models
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models
Yinghao Aaron Li
Cong Han
N. Mesgarani
19
18
0
29 Dec 2022
VSVC: Backdoor attack against Keyword Spotting based on Voiceprint
  Selection and Voice Conversion
VSVC: Backdoor attack against Keyword Spotting based on Voiceprint Selection and Voice Conversion
Hanbo Cai
Pengcheng Zhang
Hai Dong
Yan Xiao
Shunhui Ji
15
5
0
20 Dec 2022
Towards Building Text-To-Speech Systems for the Next Billion Users
Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
38
18
0
17 Nov 2022
Robust One-Shot Singing Voice Conversion
Robust One-Shot Singing Voice Conversion
Naoya Takahashi
M. Singh
Yuki Mitsufuji
DiffM
25
8
0
20 Oct 2022
Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Naoya Takahashi
Mayank Kumar
Singh
Yuki Mitsufuji
DiffM
21
16
0
14 Oct 2022
12
Next