fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit

14 September 2021

Yossi Adi

Papers citing "fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit"

43 / 43 papers shown

Title
Slamming: Training a Speech Language Model on One GPU in a Day Gallil Maimon Avishai Elmakies Yossi Adi 54 3 0 19 Feb 2025
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 43 355 0 29 Jun 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units Wei-Ning Hsu Benjamin Bolte Yao-Hung Hubert Tsai Kushal Lakhotia Ruslan Salakhutdinov Abdel-rahman Mohamed SSL 127 2,879 0 14 Jun 2021
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Adam Polyak Yossi Adi Jade Copet Eugene Kharitonov Kushal Lakhotia Wei-Ning Hsu Abdel-rahman Mohamed Emmanuel Dupoux 50 311 0 01 Apr 2021
Generative Spoken Language Modeling from Raw Audio Kushal Lakhotia Evgeny Kharitonov Wei-Ning Hsu Yossi Adi Adam Polyak ... Tu Nguyen Jade Copet Alexei Baevski A. Mohamed Emmanuel Dupoux AuLLM 214 353 0 01 Feb 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units Wei-Ning Hsu David Harwath Christopher Song James R. Glass CLIP 47 67 0 31 Dec 2020
DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling Chen Zhang Yi Ren Xu Tan Jinglin Liu Ke-jun Zhang Tao Qin Sheng Zhao Tie-Yan Liu DiffM 49 37 0 17 Dec 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis Ron J. Weiss RJ Skerry-Ryan Eric Battenberg Soroosh Mariooryad Diederik P. Kingma 43 99 0 06 Nov 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong Jaehyeon Kim Jaekyoung Bae 96 1,891 0 12 Oct 2020
fairseq S2T: Fast Speech-to-Text Modeling with fairseq Changhan Wang Yun Tang Xutai Ma Anne Wu Sravya Popuri Dmytro Okhonko J. Pino VLM LRM 51 267 0 11 Oct 2020
Unsupervised Cross-Domain Singing Voice Conversion Adam Polyak Lior Wolf Yossi Adi Yaniv Taigman 35 44 0 06 Aug 2020
Real Time Speech Enhancement in the Waveform Domain Alexandre Défossez Gabriel Synnaeve Yossi Adi 57 453 0 23 Jun 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations Alexei Baevski Henry Zhou Abdel-rahman Mohamed Michael Auli SSL 160 5,677 0 20 Jun 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer Mingjian Chen Xu Tan Yi Ren Jin Xu Hao Sun Sheng Zhao Tao Qin Tie-Yan Liu 48 109 0 08 Jun 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech Yi Ren Chenxu Hu Xu Tan Tao Qin Sheng Zhao Zhou Zhao Tie-Yan Liu 82 1,382 0 08 Jun 2020
DiscreTalk: Text-to-Speech as a Machine Translation Problem Tomoki Hayashi Shinji Watanabe 37 32 0 12 May 2020
Common Voice: A Massively-Multilingual Speech Corpus Rosana Ardila Megan Branson Kelly Davis Michael Henretty M. Kohler Josh Meyer Reuben Morais Lindsay Saunders Francis M. Tyers Gregor Weber VLM 51 1,547 0 13 Dec 2019
PyTorch: An Imperative Style, High-Performance Deep Learning Library Adam Paszke Sam Gross Francisco Massa Adam Lerer James Bradbury ... Sasank Chilamkurthy Benoit Steiner Lu Fang Junjie Bai Soumith Chintala ODL 218 42,038 0 03 Dec 2019
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech David Harwath Wei-Ning Hsu James R. Glass 39 84 0 21 Nov 2019
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit Tomoki Hayashi Ryuichi Yamamoto Katsuki Inoue Takenori Yoshimura Shinji Watanabe Tomoki Toda K. Takeda Yu Zhang Xu Tan VLM 55 203 0 24 Oct 2019
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations Alexei Baevski Steffen Schneider Michael Auli SSL 77 662 0 12 Oct 2019
Speech Recognition with Augmented Synthesized Speech Andrew Rosenberg Yu Zhang Bhuvana Ramabhadran Ye Jia Pedro J. Moreno Yonghui Wu Zelin Wu 50 127 0 25 Sep 2019
VizSeq: A Visual Analysis Toolkit for Text Generation Tasks Changhan Wang Anirudh Jain Danlu Chen Jiatao Gu 27 29 0 12 Sep 2019
Facebook FAIR's WMT19 News Translation Task Submission Nathan Ng Kyra Yee Alexei Baevski Myle Ott Michael Auli Sergey Edunov VLM 55 394 0 15 Jul 2019
Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text M. Baskar Shinji Watanabe Ramón Fernández Astudillo Takaaki Hori L. Burget J. Černocký 55 40 0 30 Apr 2019
Direct speech-to-speech translation with a sequence-to-sequence model Ye Jia Ron J. Weiss Fadi Biadsy Wolfgang Macherey Melvin Johnson Zhiwen Chen Yonghui Wu 48 225 0 12 Apr 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling Myle Ott Sergey Edunov Alexei Baevski Angela Fan Sam Gross Nathan Ng David Grangier Michael Auli VLM FaML 76 3,141 0 01 Apr 2019
Cycle-consistency training for end-to-end speech recognition Takaaki Hori Ramón Fernández Astudillo Tomoki Hayashi Yu Zhang Shinji Watanabe Jonathan Le Roux 59 87 0 02 Nov 2018
WaveGlow: A Flow-based Generative Network for Speech Synthesis R. Prenger Rafael Valle Bryan Catanzaro 120 1,024 0 31 Oct 2018
Hierarchical Generative Modeling for Controllable Speech Synthesis Wei-Ning Hsu Yu Zhang Ron J. Weiss Heiga Zen Yonghui Wu ... Ye Jia Zhiwen Chen Jonathan Shen Patrick Nguyen Ruoming Pang BDL 38 275 0 16 Oct 2018
Back-Translation-Style Data Augmentation for End-to-End ASR Tomoki Hayashi Shinji Watanabe Yu Zhang Tomoki Toda Takaaki Hori Ramón Fernández Astudillo K. Takeda 52 103 0 28 Jul 2018
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis Ye Jia Yu Zhang Ron J. Weiss Quan Wang Jonathan Shen ... Zhiwen Chen Patrick Nguyen Ruoming Pang Ignacio López Moreno Yonghui Wu 240 826 0 12 Jun 2018
Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder K. Akuzawa Yusuke Iwasawa Y. Matsuo 25 139 0 06 Apr 2018
ESPnet: End-to-End Speech Processing Toolkit Shinji Watanabe Takaaki Hori Shigeki Karita Tomoki Hayashi Jiro Nishitoba ... Jahn Heymann Sanjeev Khudanpur Nanxin Chen Adithya Renduchintala Tsubasa Ochiai VLM 79 1,492 0 30 Mar 2018
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron RJ Skerry-Ryan Eric Battenberg Y. Xiao Yuxuan Wang Daisy Stanton Joel Shor Ron J. Weiss R. Clark Rif A. Saurous 40 550 0 24 Mar 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis Yuxuan Wang Daisy Stanton Yu Zhang RJ Skerry-Ryan Eric Battenberg Joel Shor Y. Xiao Fei Ren Ye Jia Rif A. Saurous 57 822 0 23 Mar 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions Jonathan Shen Ruoming Pang Ron J. Weiss M. Schuster Navdeep Jaitly ... Yuxuan Wang RJ Skerry-Ryan Rif A. Saurous Yannis Agiomyrgiannakis Yonghui Wu 63 2,684 0 16 Dec 2017
Neural Discrete Representation Learning Aaron van den Oord Oriol Vinyals Koray Kavukcuoglu BDL SSL OCL 149 4,928 0 02 Nov 2017
Mixed Precision Training Paulius Micikevicius Sharan Narang Jonah Alben G. Diamos Erich Elsen ... Boris Ginsburg Michael Houston Oleksii Kuchaiev Ganesh Venkatesh Hao Wu 122 1,779 0 10 Oct 2017
Listening while Speaking: Speech Chain by Deep Learning Andros Tjandra S. Sakti Satoshi Nakamura AuLLM 147 165 0 16 Jul 2017
Deep Voice 2: Multi-Speaker Neural Text-to-Speech Sercan O. Arik G. Diamos Andrew Gibiansky John Miller Kainan Peng Ming-Yu Liu Jonathan Raiman Yanqi Zhou 59 495 0 24 May 2017
Sequence-to-Sequence Models Can Directly Translate Foreign Speech Ron J. Weiss J. Chorowski Navdeep Jaitly Yonghui Wu Zhiwen Chen 71 341 0 24 Mar 2017
End-to-End Text-Dependent Speaker Verification G. Heigold Ignacio López Moreno Samy Bengio Noam M. Shazeer 43 585 0 27 Sep 2015