Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.10394
Cited By
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
21 July 2021
Yinghao Aaron Li
A. Zare
N. Mesgarani
Re-assign community
ArXiv
PDF
HTML
Papers citing
"StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion"
50 / 60 papers shown
Title
Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements
Sandipan Dhar
N. D. Jana
Swagatam Das
48
0
0
27 Apr 2025
kNN-SVC: Robust Zero-Shot Singing Voice Conversion with Additive Synthesis and Concatenation Smoothness Optimization
Keren Shao
K. Chen
Matthew Baas
Shlomo Dubnov
23
0
0
08 Apr 2025
AVENet: Disentangling Features by Approximating Average Features for Voice Conversion
Wenyu Wang
Yiquan Zhou
Jihua Zhu
Hongwu Ding
Jiacheng Xu
Shihao Li
DRL
32
0
0
08 Apr 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
61
0
0
11 Mar 2025
Emotion Recognition and Generation: A Comprehensive Review of Face, Speech, and Text Modalities
Rebecca Mobbs
Dimitrios Makris
Vasileios Argyriou
43
0
0
02 Feb 2025
EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion
Ashishkumar Gudmalwar
Ishan D. Biyani
Nirmesh J. Shah
Pankaj Wasnik
R. Shah
DiffM
26
0
0
31 Dec 2024
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
O. Mutlu
Ataberk Olgun
Geraldo F. Oliveira
Ismail Emir Yüksel
44
3
0
26 Dec 2024
Optimal Transport Maps are Good Voice Converters
Arip Asadulaev
Rostislav Korst
V. Shutov
Alexander Korotin
Yaroslav Grebnyak
Vahe Egiazarian
E. Burnaev
OT
40
1
0
17 Oct 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
56
3
0
23 Sep 2024
Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition
Chien-Chun Wang
Li-Wei Chen
Cheng-Kang Chou
Hung-Shin Lee
Berlin Chen
Hsin-Min Wang
26
0
0
19 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
29
5
0
16 Sep 2024
VoiceWukong: Benchmarking Deepfake Voice Detection
Ziwei Yan
Yanjie Zhao
Haoyu Wang
40
1
0
10 Sep 2024
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Yuto Kondo
DiffM
40
0
0
03 Sep 2024
Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure Transformer Blocks and Triplet Discriminative Training
Wenhan Yao
Zedong Xing
Xiarun Chen
Jia Liu
yongqiang He
Weiping Wen
21
0
0
03 Sep 2024
Effective Noise-aware Data Simulation for Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation
Chien-Chun Wang
Li-Wei Chen
Hung-Shin Lee
Berlin Chen
Hsin-Min Wang
32
1
0
03 Sep 2024
Easy, Interpretable, Effective: openSMILE for voice deepfake detection
Octavian Pascu
Dan Oneaţă
H. Cucu
Nicolas M. Muller
48
1
0
28 Aug 2024
BUT Systems and Analyses for the ASVspoof 5 Challenge
Johan Rohdin
Lin Zhang
Oldřich Plchot
Vojtěch Staněk
David Mihola
...
Themos Stafylakis
Dmitriy Beveraki
Anna Silnova
Jan Brukner
Lukáš Burget
44
1
0
20 Aug 2024
Disentangling segmental and prosodic factors to non-native speech comprehensibility
Waris Quamer
Ricardo Gutierrez-Osuna
37
1
0
20 Aug 2024
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
Xin Wang
Héctor Delgado
Hemlata Tak
Jee-weon Jung
Hye-jin Shim
...
Md. Sahidullah
Tomi Kinnunen
Nicholas W. D. Evans
K. Lee
Junichi Yamagishi
AAML
45
38
0
16 Aug 2024
Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
Yinghao Aaron Li
Xilin Jiang
Jordan Darefsky
Ge Zhu
N. Mesgarani
41
2
0
13 Aug 2024
Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data
Yu-Hua Chen
Woosung Choi
Wei-Hsiang Liao
Marco A. Martínez-Ramírez
K. Cheuk
Yuki Mitsufuji
J. Jang
Yi-Hsuan Yang
50
5
0
22 Jun 2024
SilentCipher: Deep Audio Watermarking
Mayank Kumar Singh
Naoya Takahashi
Weihsiang Liao
Yuki Mitsufuji
43
7
0
06 Jun 2024
MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
Pengcheng Li
Jianzong Wang
Xulong Zhang
Yong Zhang
Jing Xiao
Ning Cheng
DRL
41
1
0
02 May 2024
AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
Govind Mittal
Arthur Jakobsson
Kelly O. Marshall
Chinmay Hegde
Nasir D. Memon
37
0
0
28 Feb 2024
MLAAD: The Multi-Language Audio Anti-Spoofing Dataset
Nicolas M. Muller
Piotr Kawa
Wei Herng Choong
Edresson Casanova
Eren Golge
Thorsten Muller
P. Syga
Philip Sperl
Konstantin Böttinger
42
35
0
17 Jan 2024
Attention-based Interactive Disentangling Network for Instance-level Emotional Voice Conversion
Yun Chen
Lingxiao Yang
Qi Chen
Jianhuang Lai
Xiaohua Xie
29
3
0
29 Dec 2023
Towards General-Purpose Text-Instruction-Guided Voice Conversion
Chun-Yi Kuan
Chen An Li
Tsung-Yuan Hsu
T. Lin
Ho-Lam Chung
Kai-Wei Chang
Shuo-yiin Chang
Hung-yi Lee
18
5
0
25 Sep 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
35
4
0
18 Sep 2023
Improving Voice Conversion for Dissimilar Speakers Using Perceptual Losses
Suhita Ghosh
Yamini Sinha
Ingo Siegert
Sebastian Stober
11
1
0
15 Sep 2023
StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings
Arnab Das
Suhita Ghosh
Tim Polzehl
Sebastian Stober
30
4
0
14 Sep 2023
Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion
Suhita Ghosh
Arnab Das
Yamini Sinha
Ingo Siegert
Tim Polzehl
Sebastian Stober
22
4
0
14 Sep 2023
SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
Yinghao Aaron Li
Cong Han
N. Mesgarani
28
5
0
18 Jul 2023
Towards Stealthy Backdoor Attacks against Speech Recognition via Elements of Sound
Hanbo Cai
Pengcheng Zhang
Hai Dong
Yan Xiao
Stefanos Koffas
Yiming Li
AAML
29
28
0
17 Jul 2023
The Singing Voice Conversion Challenge 2023
Wen-Chin Huang
Lester Phillip Violeta
Songxiang Liu
Jiatong Shi
T. Toda
16
46
0
26 Jun 2023
Automatic Speech Disentanglement for Voice Conversion using Rank Module and Speech Augmentation
Zhonghua Liu
Shijun Wang
Ning Chen
DRL
22
2
0
21 Jun 2023
HumanDiffusion: diffusion model using perceptual gradients
Yota Ueda
Shinnosuke Takamichi
Yuki Saito
Norihiro Takamune
Hiroshi Saruwatari
DiffM
16
0
0
21 Jun 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLM
DiffM
37
107
0
13 Jun 2023
Discussion Paper: The Threat of Real Time Deepfakes
Guy Frankovits
Yisroel Mirsky
11
5
0
04 Jun 2023
Iteratively Improving Speech Recognition and Voice Conversion
Mayank Singh
Naoya Takahashi
Ono Naoyuki
13
4
0
24 May 2023
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding
Ziqian Ning
Yuepeng Jiang
Pengcheng Zhu
Jixun Yao
Shuai Wang
Linfu Xie
Mengxiao Bi
34
10
0
21 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
34
1
0
09 May 2023
Cross-modal Face- and Voice-style Transfer
Naoya Takahashi
M. Singh
Yuki Mitsufuji
CVBM
56
2
0
27 Feb 2023
Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing
Nirmesh J. Shah
M. Singh
Naoya Takahashi
N. Onoe
49
13
0
21 Feb 2023
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
25
22
0
20 Jan 2023
Deepfake CAPTCHA: A Method for Preventing Fake Calls
Lior Yasur
Guy Frankovits
Fred M. Grabovski
Yisroel Mirsky
33
11
0
08 Jan 2023
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models
Yinghao Aaron Li
Cong Han
N. Mesgarani
19
18
0
29 Dec 2022
VSVC: Backdoor attack against Keyword Spotting based on Voiceprint Selection and Voice Conversion
Hanbo Cai
Pengcheng Zhang
Hai Dong
Yan Xiao
Shunhui Ji
15
5
0
20 Dec 2022
Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
38
18
0
17 Nov 2022
Robust One-Shot Singing Voice Conversion
Naoya Takahashi
M. Singh
Yuki Mitsufuji
DiffM
25
8
0
20 Oct 2022
Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Naoya Takahashi
Mayank Kumar
Singh
Yuki Mitsufuji
DiffM
21
16
0
14 Oct 2022
1
2
Next