ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.11480
  4. Cited By
Parallel WaveGAN: A fast waveform generation model based on generative
  adversarial networks with multi-resolution spectrogram
v1v2 (latest)

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

25 October 2019
Ryuichi Yamamoto
Eunwoo Song
Jae-Min Kim
ArXiv (abs)PDFHTML

Papers citing "Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram"

50 / 464 papers shown
Title
BEHM-GAN: Bandwidth Extension of Historical Music using Generative
  Adversarial Networks
BEHM-GAN: Bandwidth Extension of Historical Music using Generative Adversarial Networks
Eloi Moliner
Vesa Valimaki
59
19
0
13 Apr 2022
A Post Auto-regressive GAN Vocoder Focused on Spectrum Fracture
Zhe-ming Lu
Mengnan He
Ruixiong Zhang
Caixia Gong
GAN
23
2
0
12 Apr 2022
The Sillwood Technologies System for the VoiceMOS Challenge 2022
The Sillwood Technologies System for the VoiceMOS Challenge 2022
Jiameng Gao
56
0
0
08 Apr 2022
AdvEst: Adversarial Perturbation Estimation to Classify and Detect
  Adversarial Attacks against Speaker Identification
AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification
Sonal Joshi
Saurabh Kataria
Jesus Villalba
Najim Dehak
AAML
86
7
0
08 Apr 2022
Adversarial Learning of Intermediate Acoustic Feature for End-to-End
  Lightweight Text-to-Speech
Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech
Hyungchan Yoon
Seyun Um
Changwhan Kim
Hong-Goo Kang
45
0
0
05 Apr 2022
Universal Adaptor: Converting Mel-Spectrograms Between Different
  Configurations for Speech Synthesis
Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis
Fan Wang
Po-Chun Hsu
Da-Rong Liu
Hung-yi Lee
56
0
0
01 Apr 2022
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme
  Representations for Text to Speech
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Guangyan Zhang
Kaitao Song
Xu Tan
Daxin Tan
Yuzi Yan
...
G. Wang
Wei Zhou
Tao Qin
Tan Lee
Sheng Zhao
SSL
84
21
0
31 Mar 2022
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to
  Speech
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
D. Lim
Sunghee Jung
Eesung Kim
93
53
0
31 Mar 2022
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain
  Target Speaker Extraction
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction
Zexu Pan
Meng Ge
Haizhou Li
72
20
0
31 Mar 2022
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with
  Adaptive Noise Spectral Shaping
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping
Yuma Koizumi
Heiga Zen
Kohei Yatabe
Nanxin Chen
M. Bacchiani
DiffM
99
49
0
31 Mar 2022
An Overview & Analysis of Sequence-to-Sequence Emotional Voice
  Conversion
An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion
Zijiang Yang
Xin Jing
Andreas Triantafyllopoulos
Meishu Song
Ilhan Aslan
Björn W. Schuller
64
14
0
29 Mar 2022
Mel Frequency Spectral Domain Defenses against Adversarial Attacks on
  Speech Recognition Systems
Mel Frequency Spectral Domain Defenses against Adversarial Attacks on Speech Recognition Systems
Nicholas Mehlman
Anirudh Sreeram
Raghuveer Peri
Shrikanth Narayanan
AAML
165
4
0
29 Mar 2022
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality
  Speech Synthesis
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis
Max W. Y. Lam
Jun Wang
Jane Polak Scowcroft
Dong Yu
DiffM
98
97
0
25 Mar 2022
Modeling speech recognition and synthesis simultaneously: Encoding and
  decoding lexical and sublexical semantic information into speech with no
  direct access to speech data
Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data
Gašper Beguš
Alan Zhou
SSL
115
5
0
22 Mar 2022
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable
  Duration Modeling
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling
Bac Nguyen
Fabien Cardinaux
Stefan Uhlich
23
2
0
21 Mar 2022
AdaVocoder: Adaptive Vocoder for Custom Voice
AdaVocoder: Adaptive Vocoder for Custom Voice
Xin Yuan
Yongbin Feng
Mingming Ye
Cheng Tuo
Minghang Zhang
117
3
0
18 Mar 2022
A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech
  Synthesis and Editing
A3^33T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing
Richard He Bai
Renjie Zheng
Junkun Chen
Xintong Li
Mingbo Ma
Liang Huang
119
53
0
18 Mar 2022
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with
  Articulatory Features
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features
Florian Lux
Ngoc Thang Vu
99
29
0
07 Mar 2022
NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband
  Excitation for Noise-Controllable Waveform Generation
NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation
Tao Wang
Ruibo Fu
Jiangyan Yi
J. Tao
Zhengqi Wen
21
2
0
05 Mar 2022
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating
  Inverse Short-Time Fourier Transform
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform
Takuhiro Kaneko
Kou Tanaka
Hirokazu Kameoka
Shogo Seki
89
62
0
04 Mar 2022
MANNER: Multi-view Attention Network for Noise Erasure
MANNER: Multi-view Attention Network for Noise Erasure
Hyun Joon Park
Byung Ha Kang
Wooseok Shin
Jin Sob Kim
S. W. Han
92
50
0
04 Mar 2022
Speaker Adaption with Intuitive Prosodic Features for Statistical
  Parametric Speech Synthesis
Speaker Adaption with Intuitive Prosodic Features for Statistical Parametric Speech Synthesis
Pengyu Cheng
Zhenhua Ling
72
3
0
02 Mar 2022
Real time spectrogram inversion on mobile phone
Real time spectrogram inversion on mobile phone
Oleg Rybakov
Marco Tagliasacchi
Yunpeng Li
Liyang Jiang
Xia Zhang
Fadi Biadsy
131
4
0
01 Mar 2022
Revisiting Over-Smoothness in Text to Speech
Revisiting Over-Smoothness in Text to Speech
Yi Ren
Xu Tan
Tao Qin
Zhou Zhao
Tie-Yan Liu
146
64
0
26 Feb 2022
Retriever: Learning Content-Style Representation as a Token-Level
  Bipartite Graph
Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph
Dacheng Yin
Xuanchi Ren
Chong Luo
Yuwang Wang
Zhiwei Xiong
Wenjun Zeng
114
13
0
24 Feb 2022
Phase Continuity: Learning Derivatives of Phase Spectrum for Speech
  Enhancement
Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement
Doyeon Kim
Hyewon Han
Hyeon-Kyeong Shin
Soo-Whan Chung
Hong-Goo Kang
23
5
0
24 Feb 2022
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech
  Editing
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing
Tao Wang
Jiangyan Yi
Ruibo Fu
J. Tao
Zhengqi Wen
KELM
69
20
0
21 Feb 2022
It's Raw! Audio Generation with State-Space Models
It's Raw! Audio Generation with State-Space Models
Karan Goel
Albert Gu
Chris Donahue
Christopher Ré
89
195
0
20 Feb 2022
Speaker Identity Preservation in Dysarthric Speech Reconstruction by
  Adversarial Speaker Adaptation
Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation
Disong Wang
Songxiang Liu
Xixin Wu
Hui Lu
Lifa Sun
Xunying Liu
Helen Meng
54
5
0
18 Feb 2022
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge
  transfer from voice conversion
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion
Disong Wang
Shan Yang
Jane Polak Scowcroft
Xunying Liu
Dong Yu
Helen Meng
60
11
0
18 Feb 2022
On loss functions and evaluation metrics for music source separation
On loss functions and evaluation metrics for music source separation
Enric Gusó
Jordi Pons
Santiago Pascual
Joan Serrà
132
21
0
16 Feb 2022
Speech Denoising in the Waveform Domain with Self-Attention
Speech Denoising in the Waveform Domain with Self-Attention
Zhifeng Kong
Ming-Yu Liu
Ambrish Dantrey
Bryan Catanzaro
89
63
0
15 Feb 2022
textless-lib: a Library for Textless Spoken Language Processing
textless-lib: a Library for Textless Spoken Language Processing
Eugene Kharitonov
Jade Copet
Kushal Lakhotia
Tu Nguyen
Paden Tomasello
...
A. Elkahky
Wei-Ning Hsu
Abdel-rahman Mohamed
Emmanuel Dupoux
Yossi Adi
121
34
0
15 Feb 2022
Visual Acoustic Matching
Visual Acoustic Matching
Changan Chen
Ruohan Gao
P. Calamia
Kristen Grauman
77
58
0
14 Feb 2022
InferGrad: Improving Diffusion Models for Vocoder by Considering
  Inference in Training
InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training
Zehua Chen
Xu Tan
Ke Wang
Shifeng Pan
Danilo Mandic
Lei He
Sheng Zhao
DiffM
69
31
0
08 Feb 2022
PostGAN: A GAN-Based Post-Processor to Enhance the Quality of Coded
  Speech
PostGAN: A GAN-Based Post-Processor to Enhance the Quality of Coded Speech
Srikanth Korse
N. Pia
Kishan Gupta
Guillaume Fuchs
91
15
0
31 Jan 2022
ItôWave: Itô Stochastic Differential Equation Is All You Need For
  Wave Generation
ItôWave: Itô Stochastic Differential Equation Is All You Need For Wave Generation
Shoule Wu
Ziqiang Shi
DiffM
451
9
0
29 Jan 2022
Noise-robust voice conversion with domain adversarial training
Noise-robust voice conversion with domain adversarial training
Hongqiang Du
Lei Xie
Haizhou Li
66
12
0
26 Jan 2022
Improving Adversarial Waveform Generation based Singing Voice Conversion
  with Harmonic Signals
Improving Adversarial Waveform Generation based Singing Voice Conversion with Harmonic Signals
Haohan Guo
Zhiping Zhou
Fanbo Meng
Kai-Chun Liu
97
16
0
25 Jan 2022
Polyphone disambiguation and accent prediction using pre-trained
  language models in Japanese TTS front-end
Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end
Rem Hida
Masaki Hamada
Chie Kamada
E. Tsunoo
Toshiyuki Sekiya
Toshiyuki Kumakura
34
7
0
24 Jan 2022
KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data,
  Speakers, and Topics
KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics
Saida Mussakhojayeva
Yerbolat Khassanov
H. A. Varol
81
13
0
15 Jan 2022
MR-SVS: Singing Voice Synthesis with Multi-Reference Encoder
MR-SVS: Singing Voice Synthesis with Multi-Reference Encoder
Shoutong Wang
Jinglin Liu
Yi Ren
Zhen Wang
Changliang Xu
Zhou Zhao
40
7
0
11 Jan 2022
Emotion Intensity and its Control for Emotional Voice Conversion
Emotion Intensity and its Control for Emotional Voice Conversion
Kun Zhou
Berrak Sisman
R. Rana
Björn W. Schuller
Haizhou Li
172
58
0
10 Jan 2022
Improved Input Reprogramming for GAN Conditioning
Improved Input Reprogramming for GAN Conditioning
Tuan Dinh
Daewon Seo
Zhixu Du
Liang Shang
Kangwook Lee
AI4CE
103
8
0
07 Jan 2022
Audio representations for deep learning in sound synthesis: A review
Audio representations for deep learning in sound synthesis: A review
Anastasia Natsiou
Seán O'Leary
AI4TS
65
18
0
07 Jan 2022
IQDUBBING: Prosody modeling based on discrete self-supervised speech
  representation for expressive voice conversion
IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion
Wendong Gan
Bolong Wen
Yin Yan
Haitao Chen
Zhichao Wang
Hongqiang Du
Lei Xie
Kaixuan Guo
Hai Li
85
14
0
02 Jan 2022
Self-Supervised Learning based Monaural Speech Enhancement with
  Complex-Cycle-Consistent
Self-Supervised Learning based Monaural Speech Enhancement with Complex-Cycle-Consistent
Yi Li
Yang Sun
S. M. Naqvi
62
1
0
21 Dec 2021
Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale
  Corpus
Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus
Rongjie Huang
Feiyang Chen
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
94
104
0
20 Dec 2021
Training Robust Zero-Shot Voice Conversion Models with Self-supervised
  Features
Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features
Trung D. Q. Dang
Dung T. Tran
Peter Chin
K. Koishida
SSL
69
15
0
08 Dec 2021
Dilated convolution with learnable spacings
Dilated convolution with learnable spacings
Ismail Khalfaoui-Hassani
Thomas Pellegrini
T. Masquelier
123
32
0
07 Dec 2021
Previous
123...1056789
Next