v1v2 (latest)

Efficient Neural Audio Synthesis

23 February 2018

Papers citing "Efficient Neural Audio Synthesis"

50 / 469 papers shown

Title
WaveFake: A Data Set to Facilitate Audio Deepfake Detection Joel Frank Lea Schonherr DiffM 204 131 0 04 Nov 2021
RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses Shengyuan Xu Wenxiao Zhao Jing Guo 63 12 0 01 Nov 2021
TorchAudio: Building Blocks for Audio and Speech Processing Yao-Yuan Yang Moto Hira Zhaoheng Ni Anjali Chourdia Artyom Astafurov ... Mehrzad Samadi Shinji Watanabe Soumith Chintala Vincent Quenneville-Bélair Yangyang Shi 106 169 0 28 Oct 2021
Chunked Autoregressive GAN for Conditional Waveform Synthesis Max Morrison Rithesh Kumar Kundan Kumar Prem Seetharaman Aaron Courville Yoshua Bengio GAN 130 72 0 19 Oct 2021
Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation Fengyu Yang Jian Luan Yujun Wang 137 5 0 19 Oct 2021
KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke Xiaobin Zhuang Huiran Yu Weifeng Zhao Tao Jiang Peng Hu 90 6 0 18 Oct 2021
VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis Yongmao Zhang Jian Cong Heyang Xue Lei Xie Pengcheng Zhu Mengxiao Bi 97 77 0 17 Oct 2021
PixelPyramids: Exact Inference Models from Lossless Image Pyramids Shweta Mahajan Stefan Roth TPM 51 2 0 17 Oct 2021
SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation Rongjie Huang Chenye Cui Feiyang Chen Yi Ren Jinglin Liu Zhou Zhao Baoxing Huai N. Yuan GAN 203 63 0 14 Oct 2021
Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data Haitong Zhang Yue Lin 56 0 0 14 Oct 2021
Revisiting IPA-based Cross-lingual Text-to-speech Haitong Zhang Haoyue Zhan Yang Zhang Xinyuan Yu Yue Lin 61 7 0 14 Oct 2021
DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding Sergey Nikonorov Berrak Sisman Mingyang Zhang Haizhou Li 41 3 0 13 Oct 2021
Denoising Diffusion Gamma Models Eliya Nachmani S. Robin Lior Wolf DiffM VLM 81 32 0 10 Oct 2021
Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis Mu Yang Shaojin Ding Tianlong Chen Tong Wang Zhangyang Wang CLL 73 5 0 09 Oct 2021
Using multiple reference audios and style embedding constraints for speech synthesis Cheng Gong Longbiao Wang Zhenhua Ling Ju Zhang Jianwu Dang 48 5 0 09 Oct 2021
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech Pengfei Wu Junjie Pan Chenchang Xu Junhui Zhang Lin Wu Xiang Yin Zejun Ma 72 16 0 08 Oct 2021
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over Junchen Lu Berrak Sisman Rui Liu Mingyang Zhang Haizhou Li DiffM 91 20 0 07 Oct 2021
End-to-End Supermask Pruning: Learning to Prune Image Captioning Models J. Tan C. Chan Joon Huang Chuah VLM 132 16 0 07 Oct 2021
Emphasis control for parallel neural TTS Shreyas Seshadri T. Raitio D. Castellani Jiangchuan Li 120 11 0 06 Oct 2021
Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS T. Raitio Jiangchuan Li Shreyas Seshadri 78 23 0 06 Oct 2021
Autoregressive Diffusion Models Emiel Hoogeboom Alexey A. Gritsenko Jasmijn Bastings Ben Poole Rianne van den Berg Tim Salimans DiffM 127 155 0 05 Oct 2021
On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis Cheng-I Jeff Lai Erica Cooper Yang Zhang Shiyu Chang Kaizhi Qian ... Yung-Sung Chuang Alexander H. Liu Junichi Yamagishi David D. Cox James R. Glass 69 6 0 04 Oct 2021
Powerpropagation: A sparsity inducing weight reparameterisation Jonathan Richard Schwarz Siddhant M. Jayakumar Razvan Pascanu P. Latham Yee Whye Teh 194 55 0 01 Oct 2021
On-device neural speech synthesis Sivanand Achanta Albert Antony L. Golipour Jiangchuan Li T. Raitio ... Francesco Rossi Jennifer Shi Jaimin Upadhyay David Winarsky Hepeng Zhang 108 17 0 17 Sep 2021
DDS: A new device-degraded speech dataset for speech enhancement Haoyu Li Junichi Yamagishi 92 9 0 16 Sep 2021
Bilateral Denoising Diffusion Models Max W. Y. Lam Jun Wang Rongjie Huang Jane Polak Scowcroft Dong Yu DiffM 83 43 0 26 Aug 2021
Combining speakers of multiple languages to improve quality of neural voices Javier Latorre Charlotte Bailleul Tuuli H. Morrill Alistair Conkie Y. Stylianou 64 8 0 17 Aug 2021
A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate Ahmed Mustafa Jan Büthe Srikanth Korse Kishan Gupta Guillaume Fuchs N. Pia 131 19 0 09 Aug 2021
A Tandem Framework Balancing Privacy and Security for Voice User Interfaces Ranya Aloufi Hamed Haddadi David E. Boyle 90 3 0 21 Jul 2021
Approximation Theory of Convolutional Architectures for Time Series Modelling Haotian Jiang Zhong Li Qianxiao Li AI4TS 83 12 0 20 Jul 2021
Translatotron 2: High-quality direct speech-to-speech translation with voice preservation Ye Jia Michelle Tadmor Ramanovich Tal Remez Roi Pomerantz 105 73 0 19 Jul 2021
Codified audio language modeling learns useful representations for music information retrieval Rodrigo Castellon Chris Donahue Percy Liang 146 91 0 12 Jul 2021
SoundStream: An End-to-End Neural Audio Codec Neil Zeghidour Alejandro Luebs Ahmed Omran Jan Skoglund Marco Tagliasacchi AI4TS 120 806 0 07 Jul 2021
Adversarial Auto-Encoding for Packet Loss Concealment Santiago Pascual Joan Serrà Jordi Pons 71 29 0 07 Jul 2021
A Generative Model for Raw Audio Using Transformer Architectures Prateek Verma C. Chafe 79 29 0 30 Jun 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 133 359 0 29 Jun 2021
Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity Shiwei Liu Tianlong Chen Zahra Atashgahi Xiaohan Chen Ghada Sokar Elena Mocanu Mykola Pechenizkiy Zhangyang Wang Decebal Constantin Mocanu OOD 129 53 0 28 Jun 2021
Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition Zhengxi Liu Y. Qian DRL 49 10 0 25 Jun 2021
Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech Raahil Shah Kamil Pokora Abdelhamid Ezzerg V. Klimkov Goeric Huybrechts Bartosz Putrycz Daniel Korzekwa Thomas Merritt 64 26 0 24 Jun 2021
Distilling the Knowledge from Conditional Normalizing Flows Dmitry Baranchuk Vladimir Aliev Artem Babenko BDL 85 2 0 24 Jun 2021
Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis Jian Cong Shan Yang Lei Xie Jane Polak Scowcroft DRL 110 29 0 21 Jun 2021
Controllable Context-aware Conversational Speech Synthesis Jian Cong Shan Yang Na Hu Guangzhi Li Lei Xie Jane Polak Scowcroft 73 30 0 21 Jun 2021
Enriching Source Style Transfer in Recognition-Synthesis based Non-Parallel Voice Conversion Zhichao Wang Xinyong Zhou Fengyu Yang Tao Li Hongqiang Du Lei Xie Wendong Gan Haitao Chen Hai Li 65 22 0 16 Jun 2021
Improving the expressiveness of neural vocoding with non-affine Normalizing Flows Adam Gabry's Yunlong Jiao V. Klimkov Daniel Korzekwa Roberto Barra-Chicote 48 1 0 16 Jun 2021
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis D. Mohan Qinmin Hu Tian Huey Teh Alexandra Torresquintero C. Wallis Marlene Staib Lorenzo Foglianti Jiameng Gao Simon King 55 17 0 15 Jun 2021
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation Won Jang D. Lim Jaesam Yoon Bongwan Kim Juntae Kim 116 132 0 15 Jun 2021
Non Gaussian Denoising Diffusion Models Eliya Nachmani Robin San Roman Lior Wolf VLM DiffM 83 50 0 14 Jun 2021
PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior Sang-gil Lee Heeseung Kim Chaehun Shin Xu Tan Chang-Shu Liu Qi Meng Tao Qin Wei Chen Sung-Hoon Yoon Tie-Yan Liu DiffM 85 89 0 11 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim Jungil Kong Juhee Son DRL 167 903 0 11 Jun 2021
Top-KAST: Top-K Always Sparse Training Siddhant M. Jayakumar Razvan Pascanu Jack W. Rae Simon Osindero Erich Elsen 184 100 0 07 Jun 2021