Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1811.00002
Cited By
WaveGlow: A Flow-based Generative Network for Speech Synthesis
31 October 2018
R. Prenger
Rafael Valle
Bryan Catanzaro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"WaveGlow: A Flow-based Generative Network for Speech Synthesis"
50 / 525 papers shown
Title
Combined Generative and Predictive Modeling for Speech Super-resolution
Heming Wang
Eric W. Healy
DeLiang Wang
DiffM
33
0
0
25 Jan 2024
Contractive Diffusion Probabilistic Models
Wenpin Tang
Hanyang Zhao
DiffM
49
12
0
23 Jan 2024
Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis
Prabhav Agrawal
Thilo Köhler
Zhiping Xiu
Prashant Serai
Qing He
26
1
0
19 Jan 2024
FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder
Tan Dat Nguyen
Ji-Hoon Kim
Youngjoon Jang
Jaehun Kim
Joon Son Chung
DiffM
44
5
0
18 Jan 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
Hyoung-Seok Oh
Sang-Hoon Lee
Deok-Hyun Cho
Seong-Whan Lee
52
1
0
16 Jan 2024
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering
Ya-Zhen Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Xie Chen
AuLLM
21
36
0
14 Jan 2024
Incremental FastPitch: Chunk-based High Quality Text to Speech
Muyang Du
Chuan Liu
Junjie Lai
23
0
0
03 Jan 2024
Creating New Voices using Normalizing Flows
Piotr Bilinski
Thomas Merritt
Abdelhamid Ezzerg
Kamil Pokora
Sebastian Cygert
K. Yanagisawa
Roberto Barra-Chicote
Daniel Korzekwa
26
17
0
22 Dec 2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
29
28
0
15 Dec 2023
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning
Raviraj Joshi
Nikesh Garera
33
0
0
02 Dec 2023
Code-Mixed Text to Speech Synthesis under Low-Resource Constraints
Raviraj Joshi
Nikesh Garera
27
0
0
02 Dec 2023
THInImg: Cross-modal Steganography for Presenting Talking Heads in Images
Lin Zhao
Hongxuan Li
Xuefei Ning
Xinru Jiang
35
1
0
28 Nov 2023
Multi-Scale Sub-Band Constant-Q Transform Discriminator for High-Fidelity Vocoder
Yicheng Gu
Xueyao Zhang
Liumeng Xue
Zhizheng Wu
29
11
0
25 Nov 2023
A Study on Altering the Latent Space of Pretrained Text to Speech Models for Improved Expressiveness
Mathias Vogel
DiffM
45
0
0
17 Nov 2023
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Daniel Y. Fu
Hermann Kumbong
Eric N. D. Nguyen
Christopher Ré
VLM
41
29
0
10 Nov 2023
Synthetic Speaking Children -- Why We Need Them and How to Make Them
Muhammad Ali Farooq
Dan Bigioi
Rishabh Jain
Wang Yao
Mariam Yiwere
Peter Corcoran
27
0
0
08 Nov 2023
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning
Rishabh Jain
Peter Corcoran
28
0
0
07 Nov 2023
AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection
Sahibzada Adil Shahzad
Ammarah Hashmi
Yan-Tsung Peng
Yu Tsao
Hsin-Min Wang
34
5
0
05 Nov 2023
Flexible Tails for Normalising Flows, with Application to the Modelling of Financial Return Data
Tennessee Hickling
Dennis Prangle
24
4
0
01 Nov 2023
Enabling Acoustic Audience Feedback in Large Virtual Events
Tamay Aykut
M. Hofbauer
Christopher B. Kuhn
Eckehard Steinbach
Bernd Girod
55
0
0
27 Oct 2023
Generative Pre-training for Speech with Flow Matching
Alexander H. Liu
Matt Le
Apoorv Vyas
Bowen Shi
Andros Tjandra
Wei-Ning Hsu
27
31
0
25 Oct 2023
An overview of text-to-speech systems and media applications
Mohammad Reza Hasanabadi
13
3
0
22 Oct 2023
Energy-Based Models For Speech Synthesis
Wanli Sun
Zehai Tu
Anton Ragni
DiffM
26
0
0
19 Oct 2023
Generative Spoken Language Model based on continuous word-sized audio tokens
Robin Algayres
Yossi Adi
Tu Nguyen
Jade Copet
Gabriel Synnaeve
Benoît Sagot
Emmanuel Dupoux
AuLLM
46
12
0
08 Oct 2023
Unified speech and gesture synthesis using flow matching
Shivam Mehta
Ruibo Tu
Simon Alexanderson
Jonas Beskow
Éva Székely
G. Henter
45
3
0
08 Oct 2023
VITS-based Singing Voice Conversion System with DSPGAN post-processing for SVCC2023
Yi-Hua Zhou
Meng Chen
Yi Lei
Jihua Zhu
Weifeng Zhao
21
5
0
08 Oct 2023
Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset
Ze Liu
24
0
0
08 Oct 2023
VoiceExtender: Short-utterance Text-independent Speaker Verification with Guided Diffusion Model
Yayun He
Zuheng Kang
Jianzong Wang
Junqing Peng
Jing Xiao
DiffM
19
2
0
07 Oct 2023
Towards human-like spoken dialogue generation between AI agents from written dialogue
Kentaro Mitsui
Yukiya Hono
Kei Sawada
31
13
0
02 Oct 2023
Speeding Up Speech Synthesis In Diffusion Models By Reducing Data Distribution Recovery Steps Via Content Transfer
Peter Ochieng
DiffM
30
0
0
18 Sep 2023
Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?
Xin Wang
Junichi Yamagishi
SyDa
58
23
0
12 Sep 2023
Matcha-TTS: A fast TTS architecture with conditional flow matching
Shivam Mehta
Ruibo Tu
Jonas Beskow
Éva Székely
G. Henter
24
72
0
06 Sep 2023
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
Takashi Shibuya
Yuhta Takida
Yuki Mitsufuji
18
11
0
06 Sep 2023
Generative-based Fusion Mechanism for Multi-Modal Tracking
Zhangyong Tang
Tianyang Xu
Xuefeng Zhu
Xiaojun Wu
Josef Kittler
DiffM
26
31
0
04 Sep 2023
A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis
B. Hayes
Jordie Shier
Gyorgy Fazekas
Andrew Mcpherson
C. Saitis
27
21
0
29 Aug 2023
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
Ji-Hoon Kim
Jaehun Kim
Joon Son Chung
32
5
0
29 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MA
AuLLM
35
38
0
24 Aug 2023
WavMark: Watermarking for Audio Generation
Guang Chen
Yu-Huan Wu
Shujie Liu
Tao Liu
Xiaoyong Du
Furu Wei
25
33
0
24 Aug 2023
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
33
4
0
14 Aug 2023
Image Synthesis under Limited Data: A Survey and Taxonomy
Mengping Yang
Zhe Wang
28
8
0
31 Jul 2023
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Jungil Kong
Jihoon Park
Beomjeong Kim
Jeongmin Kim
Dohee Kong
Sangjin Kim
37
36
0
31 Jul 2023
Signal Reconstruction from Mel-spectrogram Based on Bi-level Consistency of Full-band Magnitude and Phase
Yoshiki Masuyama
Natsuki Ueno
Nobutaka Ono
14
1
0
23 Jul 2023
PartDiff: Image Super-resolution with Partial Diffusion Models
Kai Zhao
A. Hung
Kai-Lin Pang
Haoxin Zheng
Kyunghyun Sung
DiffM
MedIm
25
3
0
21 Jul 2023
Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables
Chin-Yun Yu
Gyorgy Fazekas
33
7
0
29 Jun 2023
MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial Learning
Mohammad Reza Hasanabadi
19
3
0
22 Jun 2023
HIDFlowNet: A Flow-Based Deep Network for Hyperspectral Image Denoising
Li Pang
Weizhen Gu
Xiangyong Cao
Xiangyu Rui
Jiangjun Peng
Shuang Xu
Gang Yang
Deyu Meng
17
0
0
20 Jun 2023
Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction
Wenzhe Liu
Yupeng Shi
Jun Chen
Wei Rao
Shulin He
Andong Li
Yannan Wang
Zhiyong Wu
24
6
0
14 Jun 2023
HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
Ji-Sang Hwang
Sang-Hoon Lee
Seong-Whan Lee
DiffM
38
8
0
12 Jun 2023
High-Fidelity Audio Compression with Improved RVQGAN
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
56
288
0
11 Jun 2023
The Age of Synthetic Realities: Challenges and Opportunities
J. P. Cardenuto
Jing Yang
Rafael Padilha
Renjie Wan
Daniel Moreira
Haoliang Li
Shiqi Wang
Fernanda A. Andaló
Sébastien Marcel
Anderson de Rezende Rocha
DeLMO
42
29
0
09 Jun 2023
Previous
1
2
3
4
5
...
9
10
11
Next