LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive
Modeling of Audio Discrete Codes

LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

5 June 2024

Trung D. Q. Dang

Dung Tran

Papers citing "LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes"

17 / 17 papers shown

Title
SpeakStream: Streaming Text-to-Speech with Interleaved Data Richard He Bai Zijin Gu Tatiana Likhomanenko Navdeep Jaitly AuLLM AI4TS 38 0 0 25 May 2025
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data Mateusz Lajszczak Guillermo Cámbara Yang Li Fatih Beyhan Arent van Korlaar ... Bartosz Putrycz Soledad López Gambino Kayeon Yoo Elena Sokolova Thomas Drugman LM&MA 63 86 0 12 Feb 2024
High-Fidelity Audio Compression with Improved RVQGAN Rithesh Kumar Prem Seetharaman Alejandro Luebs I. Kumar Kundan Kumar 88 326 0 11 Jun 2023
AudioLM: a Language Modeling Approach to Audio Generation Zalan Borsos Raphaël Marinier Damien Vincent Eugene Kharitonov Olivier Pietquin ... Dominik Roblek O. Teboul David Grangier Marco Tagliasacchi Neil Zeghidour AuLLM 126 606 0 07 Sep 2022
Diffsound: Discrete Diffusion Model for Text-to-sound Generation Dongchao Yang Jianwei Yu Helin Wang Wen Wang Chao Weng Yuexian Zou Dong Yu DiffM 79 304 0 20 Jul 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone Edresson Casanova Julian Weber C. Shulby Arnaldo Cândido Júnior Eren Golge M. Ponti 217 407 0 04 Dec 2021
DeepFilterNet: A Low Complexity Speech Enhancement Framework for Full-Band Audio based on Deep Filtering Hendrik Schröter Alberto N. Escalante T. Rosenkranz Andreas Maier 51 75 0 11 Oct 2021
DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors Chandan K. A. Reddy Vishak Gopal Ross Cutler 73 215 0 05 Oct 2021
Simple and Effective Zero-shot Cross-lingual Phoneme Recognition Qiantong Xu Alexei Baevski Michael Auli VLM 109 86 0 23 Sep 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units Wei-Ning Hsu Benjamin Bolte Yao-Hung Hubert Tsai Kushal Lakhotia Ruslan Salakhutdinov Abdel-rahman Mohamed SSL 149 2,939 0 14 Jun 2021
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model Edresson Casanova C. Shulby Eren Golge Nicolas Müller F. S. Oliveira Arnaldo Cândido Júnior A. S. Soares S. Aluísio M. Ponti 48 100 0 02 Apr 2021
DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors Chandan K. A. Reddy Vishak Gopal Ross Cutler 64 308 0 28 Oct 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations Alexei Baevski Henry Zhou Abdel-rahman Mohamed Michael Auli SSL 241 5,774 0 20 Jun 2020
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification Brecht Desplanques Jenthe Thienpondt Kris Demuynck 72 1,330 0 14 May 2020
Libri-Light: A Benchmark for ASR with Limited or No Supervision Jacob Kahn M. Rivière Weiyi Zheng Evgeny Kharitonov Qiantong Xu ... Tatiana Likhomanenko Gabriel Synnaeve Armand Joulin Abdel-rahman Mohamed Emmanuel Dupoux AuLLM 55 670 0 17 Dec 2019
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech Heiga Zen Viet Dang R. Clark Yu Zhang Ron J. Weiss Ye Jia Zhiwen Chen Yonghui Wu 96 947 0 05 Apr 2019
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions Jonathan Shen Ruoming Pang Ron J. Weiss M. Schuster Navdeep Jaitly ... Yuxuan Wang RJ Skerry-Ryan Rif A. Saurous Yannis Agiomyrgiannakis Yonghui Wu 77 2,694 0 16 Dec 2017