High Fidelity Speech Synthesis with Adversarial Networks

25 September 2019

Papers citing "High Fidelity Speech Synthesis with Adversarial Networks"

50 / 149 papers shown

Title
Towards Error-Resilient Neural Speech Coding Huaying Xue Xiulian Peng Xue Jiang Yan Lu 21 7 0 03 Jul 2022
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers Liumeng Xue Shan Yang Na Hu Dan Su Linfu Xie 21 2 0 02 Jul 2022
Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models Fan Bao Chongxuan Li Jiacheng Sun Jun Zhu Bo Zhang DiffM 36 72 0 15 Jun 2022
BigVGAN: A Universal Neural Vocoder with Large-Scale Training Sang-gil Lee Ming-Yu Liu Boris Ginsburg Bryan Catanzaro Sung-Hoon Yoon 17 225 0 09 Jun 2022
Deep Learning Enabled Semantic Communications with Speech Recognition and Synthesis Zhenzi Weng Zhijin Qin Xiaoming Tao Chengkang Pan Guangyi Liu Geoffrey Ye Li 35 132 0 09 May 2022
SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech Zhenhui Ye Zhou Zhao Yi Ren Fei Wu 29 27 0 25 Apr 2022
The Sillwood Technologies System for the VoiceMOS Challenge 2022 Jiameng Gao 23 0 0 08 Apr 2022
Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech Hyungchan Yoon Seyun Um Changwhan Kim Hong-Goo Kang 20 0 0 05 Apr 2022
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis Hubert Siuzdak Piotr Dura Pol van Rijn Nori Jacoby AI4TS 18 30 0 31 Mar 2022
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis Max W. Y. Lam Jun Wang Dan Su Dong Yu DiffM 34 92 0 25 Mar 2022
Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data Gašper Beguš Alan Zhou SSL 24 4 0 22 Mar 2022
Reproducible Subjective Evaluation Max Morrison Brian Tang Gefei Tan Bryan Pardo 20 6 0 08 Mar 2022
Practical cognitive speech compression Reza Lotfidereshgi P. Gournay 32 2 0 08 Mar 2022
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform Takuhiro Kaneko Kou Tanaka Hirokazu Kameoka Shogo Seki 19 60 0 04 Mar 2022
Revisiting Over-Smoothness in Text to Speech Yi Ren Xu Tan Tao Qin Zhou Zhao Tie-Yan Liu 70 61 0 26 Feb 2022
It's Raw! Audio Generation with State-Space Models Karan Goel Albert Gu Chris Donahue Christopher Ré 14 186 0 20 Feb 2022
Attributable-Watermarking of Speech Generative Models Yongbaek Cho Changhoon Kim Yezhou Yang Yi Ren 14 6 0 17 Feb 2022
Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module Adam Gabry's Goeric Huybrechts M. Ribeiro C. Chien Julian Roth Giulia Comini Roberto Barra-Chicote Bartek Perz Jaime Lorenzo-Trueba 30 21 0 16 Feb 2022
InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training Zehua Chen Xu Tan Ke Wang Shifeng Pan Danilo Mandic Lei He Sheng Zhao DiffM 25 28 0 08 Feb 2022
J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis Shinnosuke Takamichi Wataru Nakata Naoko Tanji Hiroshi Saruwatari AuLLM 25 6 0 26 Jan 2022
Improved Input Reprogramming for GAN Conditioning Tuan Dinh Daewon Seo Zhixu Du Liang Shang Kangwook Lee AI4CE 22 8 0 07 Jan 2022
Audio representations for deep learning in sound synthesis: A review Anastasia Natsiou Seán O'Leary AI4TS 19 18 0 07 Jan 2022
Semantic Communications: Principles and Challenges Zhijin Qin Xiaoming Tao Jianhua Lu Wen Tong Geoffrey Ye Li 31 338 0 30 Dec 2021
Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus Rongjie Huang Feiyang Chen Yi Ren Jinglin Liu Chenye Cui Zhou Zhao 28 98 0 20 Dec 2021
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone Edresson Casanova Julian Weber C. Shulby Arnaldo Cândido Júnior Eren Golge M. Ponti 185 378 0 04 Dec 2021
Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance Heeseung Kim Sungwon Kim Sungroh Yoon DiffM BDL 19 107 0 23 Nov 2021
High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency Nikolaos Ellinas G. Vamvoukakis K. Markopoulos Aimilios Chalamandaris Georgia Maniati Panos Kakoulidis S. Raptis June Sig Sung Hyoungmin Park Pirros Tsiakoulis 6 36 0 17 Nov 2021
Generating Diverse Realistic Laughter for Interactive Art Mehdi Park Eric Paquette Étienne Gidel Gauthier Mathewso Afsar Eric Park Étienne Paquette Gauthier Gidel Kory W. Mathewson Eilif B. Muller 20 7 0 04 Nov 2021
WaveFake: A Data Set to Facilitate Audio Deepfake Detection Joel Frank Lea Schonherr DiffM 129 123 0 04 Nov 2021
Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units Anurag Katakkar A. Black AuLLM 27 1 0 31 Oct 2021
Chunked Autoregressive GAN for Conditional Waveform Synthesis Max Morrison Rithesh Kumar Kundan Kumar Prem Seetharaman Aaron Courville Yoshua Bengio GAN 36 68 0 19 Oct 2021
FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection Zhenyu Zhang Yewei Gu Xiaowei Yi Xianfeng Zhao 27 24 0 18 Oct 2021
Taming Visually Guided Sound Generation Vladimir E. Iashin Esa Rahtu VLM 28 121 0 17 Oct 2021
ESPnet2-TTS: Extending the Edge of TTS Research Tomoki Hayashi Ryuichi Yamamoto Takenori Yoshimura Peter Wu Jiatong Shi Takaaki Saeki Yooncheol Ju Yusuke Yasuda Shinnosuke Takamichi Shinji Watanabe VLM 50 60 0 15 Oct 2021
LaughNet: synthesizing laughter utterances from waveform silhouettes and a single laughter example Hieu-Thi Luong Junichi Yamagishi 44 9 0 11 Oct 2021
Denoising Diffusion Gamma Models Eliya Nachmani S. Robin Lior Wolf DiffM VLM 20 30 0 10 Oct 2021
FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis Manh Luong Viet-Anh Tran 6 2 0 27 Sep 2021
Bilateral Denoising Diffusion Models Max W. Y. Lam Jun Wang Rongjie Huang Dan Su Dong Yu DiffM 27 42 0 26 Aug 2021
A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate Ahmed Mustafa Jan Büthe Srikanth Korse Kishan Gupta Guillaume Fuchs N. Pia 18 18 0 09 Aug 2021
DarkGAN: Exploiting Knowledge Distillation for Comprehensible Audio Synthesis with GANs J. Nistal Stefan Lattner G. Richard 21 8 0 03 Aug 2021
A Survey on Audio Synthesis and Audio-Visual Multimodal Processing Zhaofeng Shi 24 7 0 01 Aug 2021
Generative Models for Security: Attacks, Defenses, and Opportunities L. A. Bauer Vincent Bindschaedler 25 4 0 21 Jul 2021
Filtered Noise Shaping for Time Domain Room Impulse Response Estimation From Reverberant Speech C. Steinmetz V. Ithapu P. Calamia 46 39 0 15 Jul 2021
Adversarial Auto-Encoding for Packet Loss Concealment Santiago Pascual Joan Serrà Jordi Pons 31 27 0 07 Jul 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 18 352 0 29 Jun 2021
AI based Presentation Creator With Customized Audio Content Delivery Muvazima Mansoor Srikanth Chandar Ramamoorthy Srinath 18 0 0 27 Jun 2021
Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis Jian Cong Shan Yang Lei Xie Dan Su DRL 18 29 0 21 Jun 2021
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis Nanxin Chen Yu Zhang Heiga Zen Ron J. Weiss Mohammad Norouzi Najim Dehak William Chan DiffM 21 88 0 17 Jun 2021
Non Gaussian Denoising Diffusion Models Eliya Nachmani Robin San Roman Lior Wolf VLM DiffM 32 48 0 14 Jun 2021
Catch-A-Waveform: Learning to Generate Audio from a Single Short Example Gal Greshler Tamar Rott Shaham T. Michaeli 18 25 0 11 Jun 2021