Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2109.06912
Cited By
fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit
14 September 2021
Changhan Wang
Wei-Ning Hsu
Yossi Adi
Adam Polyak
Ann Lee
Peng-Jen Chen
Jiatao Gu
J. Pino
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit"
43 / 43 papers shown
Title
Slamming: Training a Speech Language Model on One GPU in a Day
Gallil Maimon
Avishai Elmakies
Yossi Adi
54
3
0
19 Feb 2025
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
43
355
0
29 Jun 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
127
2,879
0
14 Jun 2021
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
Adam Polyak
Yossi Adi
Jade Copet
Eugene Kharitonov
Kushal Lakhotia
Wei-Ning Hsu
Abdel-rahman Mohamed
Emmanuel Dupoux
50
311
0
01 Apr 2021
Generative Spoken Language Modeling from Raw Audio
Kushal Lakhotia
Evgeny Kharitonov
Wei-Ning Hsu
Yossi Adi
Adam Polyak
...
Tu Nguyen
Jade Copet
Alexei Baevski
A. Mohamed
Emmanuel Dupoux
AuLLM
214
353
0
01 Feb 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Wei-Ning Hsu
David Harwath
Christopher Song
James R. Glass
CLIP
47
67
0
31 Dec 2020
DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling
Chen Zhang
Yi Ren
Xu Tan
Jinglin Liu
Ke-jun Zhang
Tao Qin
Sheng Zhao
Tie-Yan Liu
DiffM
49
37
0
17 Dec 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
43
99
0
06 Nov 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
96
1,891
0
12 Oct 2020
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
Changhan Wang
Yun Tang
Xutai Ma
Anne Wu
Sravya Popuri
Dmytro Okhonko
J. Pino
VLM
LRM
51
267
0
11 Oct 2020
Unsupervised Cross-Domain Singing Voice Conversion
Adam Polyak
Lior Wolf
Yossi Adi
Yaniv Taigman
35
44
0
06 Aug 2020
Real Time Speech Enhancement in the Waveform Domain
Alexandre Défossez
Gabriel Synnaeve
Yossi Adi
57
453
0
23 Jun 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
160
5,677
0
20 Jun 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer
Mingjian Chen
Xu Tan
Yi Ren
Jin Xu
Hao Sun
Sheng Zhao
Tao Qin
Tie-Yan Liu
48
109
0
08 Jun 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
82
1,382
0
08 Jun 2020
DiscreTalk: Text-to-Speech as a Machine Translation Problem
Tomoki Hayashi
Shinji Watanabe
37
32
0
12 May 2020
Common Voice: A Massively-Multilingual Speech Corpus
Rosana Ardila
Megan Branson
Kelly Davis
Michael Henretty
M. Kohler
Josh Meyer
Reuben Morais
Lindsay Saunders
Francis M. Tyers
Gregor Weber
VLM
51
1,547
0
13 Dec 2019
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
218
42,038
0
03 Dec 2019
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech
David Harwath
Wei-Ning Hsu
James R. Glass
39
84
0
21 Nov 2019
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit
Tomoki Hayashi
Ryuichi Yamamoto
Katsuki Inoue
Takenori Yoshimura
Shinji Watanabe
Tomoki Toda
K. Takeda
Yu Zhang
Xu Tan
VLM
55
203
0
24 Oct 2019
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
Alexei Baevski
Steffen Schneider
Michael Auli
SSL
77
662
0
12 Oct 2019
Speech Recognition with Augmented Synthesized Speech
Andrew Rosenberg
Yu Zhang
Bhuvana Ramabhadran
Ye Jia
Pedro J. Moreno
Yonghui Wu
Zelin Wu
50
127
0
25 Sep 2019
VizSeq: A Visual Analysis Toolkit for Text Generation Tasks
Changhan Wang
Anirudh Jain
Danlu Chen
Jiatao Gu
27
29
0
12 Sep 2019
Facebook FAIR's WMT19 News Translation Task Submission
Nathan Ng
Kyra Yee
Alexei Baevski
Myle Ott
Michael Auli
Sergey Edunov
VLM
55
394
0
15 Jul 2019
Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text
M. Baskar
Shinji Watanabe
Ramón Fernández Astudillo
Takaaki Hori
L. Burget
J. Černocký
55
40
0
30 Apr 2019
Direct speech-to-speech translation with a sequence-to-sequence model
Ye Jia
Ron J. Weiss
Fadi Biadsy
Wolfgang Macherey
Melvin Johnson
Zhiwen Chen
Yonghui Wu
48
225
0
12 Apr 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
VLM
FaML
76
3,141
0
01 Apr 2019
Cycle-consistency training for end-to-end speech recognition
Takaaki Hori
Ramón Fernández Astudillo
Tomoki Hayashi
Yu Zhang
Shinji Watanabe
Jonathan Le Roux
59
87
0
02 Nov 2018
WaveGlow: A Flow-based Generative Network for Speech Synthesis
R. Prenger
Rafael Valle
Bryan Catanzaro
120
1,024
0
31 Oct 2018
Hierarchical Generative Modeling for Controllable Speech Synthesis
Wei-Ning Hsu
Yu Zhang
Ron J. Weiss
Heiga Zen
Yonghui Wu
...
Ye Jia
Zhiwen Chen
Jonathan Shen
Patrick Nguyen
Ruoming Pang
BDL
38
275
0
16 Oct 2018
Back-Translation-Style Data Augmentation for End-to-End ASR
Tomoki Hayashi
Shinji Watanabe
Yu Zhang
Tomoki Toda
Takaaki Hori
Ramón Fernández Astudillo
K. Takeda
52
103
0
28 Jul 2018
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Zhiwen Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
240
826
0
12 Jun 2018
Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder
K. Akuzawa
Yusuke Iwasawa
Y. Matsuo
25
139
0
06 Apr 2018
ESPnet: End-to-End Speech Processing Toolkit
Shinji Watanabe
Takaaki Hori
Shigeki Karita
Tomoki Hayashi
Jiro Nishitoba
...
Jahn Heymann
Sanjeev Khudanpur
Nanxin Chen
Adithya Renduchintala
Tsubasa Ochiai
VLM
79
1,492
0
30 Mar 2018
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
RJ Skerry-Ryan
Eric Battenberg
Y. Xiao
Yuxuan Wang
Daisy Stanton
Joel Shor
Ron J. Weiss
R. Clark
Rif A. Saurous
40
550
0
24 Mar 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Y. Xiao
Fei Ren
Ye Jia
Rif A. Saurous
57
822
0
23 Mar 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
...
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
63
2,684
0
16 Dec 2017
Neural Discrete Representation Learning
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDL
SSL
OCL
149
4,928
0
02 Nov 2017
Mixed Precision Training
Paulius Micikevicius
Sharan Narang
Jonah Alben
G. Diamos
Erich Elsen
...
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
Hao Wu
122
1,779
0
10 Oct 2017
Listening while Speaking: Speech Chain by Deep Learning
Andros Tjandra
S. Sakti
Satoshi Nakamura
AuLLM
147
165
0
16 Jul 2017
Deep Voice 2: Multi-Speaker Neural Text-to-Speech
Sercan O. Arik
G. Diamos
Andrew Gibiansky
John Miller
Kainan Peng
Ming-Yu Liu
Jonathan Raiman
Yanqi Zhou
59
495
0
24 May 2017
Sequence-to-Sequence Models Can Directly Translate Foreign Speech
Ron J. Weiss
J. Chorowski
Navdeep Jaitly
Yonghui Wu
Zhiwen Chen
71
341
0
24 Mar 2017
End-to-End Text-Dependent Speaker Verification
G. Heigold
Ignacio López Moreno
Samy Bengio
Noam M. Shazeer
43
585
0
27 Sep 2015
1