ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.11480
  4. Cited By
Parallel WaveGAN: A fast waveform generation model based on generative
  adversarial networks with multi-resolution spectrogram
v1v2 (latest)

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

25 October 2019
Ryuichi Yamamoto
Eunwoo Song
Jae-Min Kim
ArXiv (abs)PDFHTML

Papers citing "Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram"

50 / 464 papers shown
Title
VocBench: A Neural Vocoder Benchmark for Speech Synthesis
VocBench: A Neural Vocoder Benchmark for Speech Synthesis
Ehab A. AlBadawy
Andrew Gibiansky
Qing He
Jilong Wu
Ming-Ching Chang
Siwei Lyu
58
12
0
06 Dec 2021
Steerable discovery of neural audio effects
Steerable discovery of neural audio effects
C. Steinmetz
Joshua D. Reiss
52
6
0
06 Dec 2021
High Quality Streaming Speech Synthesis with Low,
  Sentence-Length-Independent Latency
High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Aimilios Chalamandaris
Georgia Maniati
Panos Kakoulidis
S. Raptis
June Sig Sung
Hyoungmin Park
Pirros Tsiakoulis
139
37
0
17 Nov 2021
AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice
  Conversion
AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion
Damien Ronssin
Milos Cernak
78
11
0
12 Nov 2021
RAVE: A variational autoencoder for fast and high-quality neural audio
  synthesis
RAVE: A variational autoencoder for fast and high-quality neural audio synthesis
Antoine Caillon
P. Esling
DRL
68
112
0
09 Nov 2021
Speaker Generation
Speaker Generation
Daisy Stanton
Matt Shannon
Soroosh Mariooryad
RJ Skerry-Ryan
Eric Battenberg
Tom Bagby
David Kao
77
30
0
07 Nov 2021
WaveFake: A Data Set to Facilitate Audio Deepfake Detection
WaveFake: A Data Set to Facilitate Audio Deepfake Detection
Joel Frank
Lea Schonherr
DiffM
204
131
0
04 Nov 2021
RefineGAN: Universally Generating Waveform Better than Ground Truth with
  Highly Accurate Pitch and Intensity Responses
RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses
Shengyuan Xu
Wenxiao Zhao
Jing Guo
63
12
0
01 Nov 2021
Learning Continuous Representation of Audio for Arbitrary Scale Super
  Resolution
Learning Continuous Representation of Audio for Arbitrary Scale Super Resolution
Jaechang Kim
Yunjoo Lee
Seunghoon Hong
Jungseul Ok
SupRCLL
70
13
0
30 Oct 2021
Zero-shot Voice Conversion via Self-supervised Prosody Representation
  Learning
Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning
Shijun Wang
Dimche Kostadinov
Damian Borth
83
11
0
27 Oct 2021
TUNet: A Block-online Bandwidth Extension Model based on Transformers
  and Self-supervised Pretraining
TUNet: A Block-online Bandwidth Extension Model based on Transformers and Self-supervised Pretraining
Viet-Anh Nguyen
Anh H. T. Nguyen
Andy W. H. Khong
60
22
0
26 Oct 2021
Disentanglement of Emotional Style and Speaker Identity for Expressive
  Voice Conversion
Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion
Zongyang Du
Berrak Sisman
Kun Zhou
Haizhou Li
91
24
0
20 Oct 2021
Speech Enhancement-assisted Voice Conversion in Noisy Environments
Speech Enhancement-assisted Voice Conversion in Noisy Environments
Yun-Ju Chan
Chiang-Jen Peng
Syu-Siang Wang
Hsin-Min Wang
Yu Tsao
T. Chi
46
2
0
19 Oct 2021
Neural Synthesis of Footsteps Sound Effects with Generative Adversarial
  Networks
Neural Synthesis of Footsteps Sound Effects with Generative Adversarial Networks
Marco Comunità
Huy Phan
Joshua D. Reiss
GAN
49
11
0
18 Oct 2021
KaraTuner: Towards end to end natural pitch correction for singing voice
  in karaoke
KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke
Xiaobin Zhuang
Huiran Yu
Weifeng Zhao
Tao Jiang
Peng Hu
90
6
0
18 Oct 2021
Neural Dubber: Dubbing for Videos According to Scripts
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffMVGen
99
43
0
15 Oct 2021
Towards Identity Preserving Normal to Dysarthric Voice Conversion
Towards Identity Preserving Normal to Dysarthric Voice Conversion
Wen-Chin Huang
B. Halpern
Lester Phillip Violeta
O. Scharenborg
Tomoki Toda
106
23
0
15 Oct 2021
ESPnet2-TTS: Extending the Edge of TTS Research
ESPnet2-TTS: Extending the Edge of TTS Research
Tomoki Hayashi
Ryuichi Yamamoto
Takenori Yoshimura
Peter Wu
Jiatong Shi
Takaaki Saeki
Yooncheol Ju
Yusuke Yasuda
Shinnosuke Takamichi
Shinji Watanabe
VLM
85
63
0
15 Oct 2021
SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice
  Generation
SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation
Rongjie Huang
Chenye Cui
Feiyang Chen
Yi Ren
Jinglin Liu
Zhou Zhao
Baoxing Huai
N. Yuan
GAN
203
63
0
14 Oct 2021
FedSpeech: Federated Text-to-Speech with Continual Learning
FedSpeech: Federated Text-to-Speech with Continual Learning
Ziyue Jiang
Yi Ren
Ming Lei
Zhou Zhao
FedML
166
28
0
14 Oct 2021
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language
  Processing
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Junyi Ao
Rui Wang
Long Zhou
Chengyi Wang
Shuo Ren
...
Yu Zhang
Zhihua Wei
Yao Qian
Jinyu Li
Furu Wei
162
202
0
14 Oct 2021
A Melody-Unsupervision Model for Singing Voice Synthesis
A Melody-Unsupervision Model for Singing Voice Synthesis
Soonbeom Choi
Juhan Nam
67
14
0
13 Oct 2021
Source Mixing and Separation Robust Audio Steganography
Source Mixing and Separation Robust Audio Steganography
Naoya Takahashi
M. Singh
Yuki Mitsufuji
60
6
0
11 Oct 2021
Towards Universal Neural Vocoding with a Multi-band Excited WaveNet
Towards Universal Neural Vocoding with a Multi-band Excited WaveNet
Axel Roebel
F. Bous
56
2
0
07 Oct 2021
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
E. Hortal
Rodrigo Brechard Alarcia
GAN
46
2
0
06 Oct 2021
Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet
Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet
Max Morrison
Zeyu Jin
Nicholas J. Bryan
Juan-Pablo Caceres
Bryan Pardo
73
14
0
05 Oct 2021
On the Interplay Between Sparsity, Naturalness, Intelligibility, and
  Prosody in Speech Synthesis
On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis
Cheng-I Jeff Lai
Erica Cooper
Yang Zhang
Shiyu Chang
Kaizhi Qian
...
Yung-Sung Chuang
Alexander H. Liu
Junichi Yamagishi
David D. Cox
James R. Glass
69
6
0
04 Oct 2021
PortaSpeech: Portable and High-Quality Generative Text-to-Speech
PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Yi Ren
Jinglin Liu
Zhou Zhao
122
79
0
30 Sep 2021
VoiceFixer: Toward General Speech Restoration with Neural Vocoder
VoiceFixer: Toward General Speech Restoration with Neural Vocoder
Haohe Liu
Qiuqiang Kong
Qiao Tian
Yan Zhao
DeLiang Wang
Chuanzeng Huang
Yuxuan Wang
87
58
0
28 Sep 2021
MSR-NV: Neural Vocoder Using Multiple Sampling Rates
MSR-NV: Neural Vocoder Using Multiple Sampling Rates
Kentaro Mitsui
Kei Sawada
109
0
0
28 Sep 2021
FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for
  Speech Synthesis
FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis
Manh Luong
Viet-Anh Tran
24
2
0
27 Sep 2021
Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice
  Conversion
Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion
Yi-Syuan Liou
Wen-Chin Huang
Ming-Chi Yen
S. Tsai
Yu-Huai Peng
Tomoki Toda
Yu Tsao
Hsin-Min Wang
66
1
0
08 Sep 2021
Bilateral Denoising Diffusion Models
Bilateral Denoising Diffusion Models
Max W. Y. Lam
Jun Wang
Rongjie Huang
Jane Polak Scowcroft
Dong Yu
DiffM
83
43
0
26 Aug 2021
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
Ji-Hoon Kim
Sang-Hoon Lee
Ji-Hyun Lee
Hong G Jung
Seong-Whan Lee
162
6
0
16 Aug 2021
Masked Acoustic Unit for Mispronunciation Detection and Correction
Masked Acoustic Unit for Mispronunciation Detection and Correction
Zhan Zhang
Yuehai Wang
Jianyi Yang
116
3
0
12 Aug 2021
A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate
A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate
Ahmed Mustafa
Jan Büthe
Srikanth Korse
Kishan Gupta
Guillaume Fuchs
N. Pia
129
19
0
09 Aug 2021
Applying the Information Bottleneck Principle to Prosodic Representation
  Learning
Applying the Information Bottleneck Principle to Prosodic Representation Learning
Guangyan Zhang
Ying Qin
Daxin Tan
Tan Lee
77
4
0
05 Aug 2021
DarkGAN: Exploiting Knowledge Distillation for Comprehensible Audio
  Synthesis with GANs
DarkGAN: Exploiting Knowledge Distillation for Comprehensible Audio Synthesis with GANs
J. Nistal
Stefan Lattner
G. Richard
74
9
0
03 Aug 2021
Creation and Detection of German Voice Deepfakes
Creation and Detection of German Voice Deepfakes
Vanessa Barnekow
Dominik Binder
Niclas Kromrey
Pascal Munaretto
A. Schaad
Felix Schmieder
23
3
0
02 Aug 2021
A Survey on Audio Synthesis and Audio-Visual Multimodal Processing
A Survey on Audio Synthesis and Audio-Visual Multimodal Processing
Zhaofeng Shi
54
7
0
01 Aug 2021
Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal
  Language
Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal Language
Huiyan Li
Haohong Lin
You Wang
Hengyang Wang
Ming Zhang
Han Gao
Qing Ai
Zhiyuan Luo
Guang Li
63
14
0
31 Jul 2021
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for
  Natural-Sounding Voice Conversion
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
Yinghao Aaron Li
A. Zare
N. Mesgarani
97
101
0
21 Jul 2021
Digital Einstein Experience: Fast Text-to-Speech for Conversational AI
Digital Einstein Experience: Fast Text-to-Speech for Conversational AI
Joanna Rownicka
Kilian Sprenkamp
A. Tripiana
Volodymyr Gromoglasov
Timo P. Kunz
26
0
0
21 Jul 2021
On Prosody Modeling for ASR+TTS based Voice Conversion
On Prosody Modeling for ASR+TTS based Voice Conversion
Wen-Chin Huang
Tomoki Hayashi
Xinjian Li
Shinji Watanabe
Tomoki Toda
73
9
0
20 Jul 2021
SVSNet: An End-to-end Speaker Voice Similarity Assessment Model
SVSNet: An End-to-end Speaker Voice Similarity Assessment Model
Cheng-Hung Hu
Yu-Huai Peng
Junichi Yamagishi
Yu Tsao
Hsin-Min Wang
48
5
0
20 Jul 2021
Filtered Noise Shaping for Time Domain Room Impulse Response Estimation
  From Reverberant Speech
Filtered Noise Shaping for Time Domain Room Impulse Response Estimation From Reverberant Speech
C. Steinmetz
V. Ithapu
P. Calamia
83
40
0
15 Jul 2021
Neural Waveshaping Synthesis
Neural Waveshaping Synthesis
B. Hayes
C. Saitis
Gyorgy Fazekas
85
28
0
11 Jul 2021
EditSpeech: A Text Based Speech Editing System Using Partial Inference
  and Bidirectional Fusion
EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion
Daxin Tan
Liqun Deng
Y. Yeung
Xin Jiang
Xiao Chen
Tan Lee
91
41
0
04 Jul 2021
Adversarial Sample Detection for Speaker Verification by Neural Vocoders
Adversarial Sample Detection for Speaker Verification by Neural Vocoders
Haibin Wu
Po-Chun Hsu
Ji Gao
Shanshan Zhang
Shen Huang
Jian Kang
Zhiyong Wu
Helen Meng
Hung-yi Lee
AAML
93
21
0
01 Jul 2021
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
133
359
0
29 Jun 2021
Previous
123...106789
Next