ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2109.06912
  4. Cited By
fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit

fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit

14 September 2021
Changhan Wang
Wei-Ning Hsu
Yossi Adi
Adam Polyak
Ann Lee
Peng-Jen Chen
Jiatao Gu
J. Pino
    VLM
ArXivPDFHTML

Papers citing "fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit"

43 / 43 papers shown
Title
Slamming: Training a Speech Language Model on One GPU in a Day
Slamming: Training a Speech Language Model on One GPU in a Day
Gallil Maimon
Avishai Elmakies
Yossi Adi
54
3
0
19 Feb 2025
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
43
355
0
29 Jun 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked
  Prediction of Hidden Units
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
127
2,879
0
14 Jun 2021
Speech Resynthesis from Discrete Disentangled Self-Supervised
  Representations
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
Adam Polyak
Yossi Adi
Jade Copet
Eugene Kharitonov
Kushal Lakhotia
Wei-Ning Hsu
Abdel-rahman Mohamed
Emmanuel Dupoux
50
311
0
01 Apr 2021
Generative Spoken Language Modeling from Raw Audio
Generative Spoken Language Modeling from Raw Audio
Kushal Lakhotia
Evgeny Kharitonov
Wei-Ning Hsu
Yossi Adi
Adam Polyak
...
Tu Nguyen
Jade Copet
Alexei Baevski
A. Mohamed
Emmanuel Dupoux
AuLLM
214
353
0
01 Feb 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Wei-Ning Hsu
David Harwath
Christopher Song
James R. Glass
CLIP
47
67
0
31 Dec 2020
DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling
DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling
Chen Zhang
Yi Ren
Xu Tan
Jinglin Liu
Ke-jun Zhang
Tao Qin
Sheng Zhao
Tie-Yan Liu
DiffM
49
37
0
17 Dec 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
43
99
0
06 Nov 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
96
1,891
0
12 Oct 2020
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
Changhan Wang
Yun Tang
Xutai Ma
Anne Wu
Sravya Popuri
Dmytro Okhonko
J. Pino
VLM
LRM
51
267
0
11 Oct 2020
Unsupervised Cross-Domain Singing Voice Conversion
Unsupervised Cross-Domain Singing Voice Conversion
Adam Polyak
Lior Wolf
Yossi Adi
Yaniv Taigman
35
44
0
06 Aug 2020
Real Time Speech Enhancement in the Waveform Domain
Real Time Speech Enhancement in the Waveform Domain
Alexandre Défossez
Gabriel Synnaeve
Yossi Adi
57
453
0
23 Jun 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
  Representations
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
160
5,677
0
20 Jun 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer
MultiSpeech: Multi-Speaker Text to Speech with Transformer
Mingjian Chen
Xu Tan
Yi Ren
Jin Xu
Hao Sun
Sheng Zhao
Tao Qin
Tie-Yan Liu
48
109
0
08 Jun 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
82
1,382
0
08 Jun 2020
DiscreTalk: Text-to-Speech as a Machine Translation Problem
DiscreTalk: Text-to-Speech as a Machine Translation Problem
Tomoki Hayashi
Shinji Watanabe
37
32
0
12 May 2020
Common Voice: A Massively-Multilingual Speech Corpus
Common Voice: A Massively-Multilingual Speech Corpus
Rosana Ardila
Megan Branson
Kelly Davis
Michael Henretty
M. Kohler
Josh Meyer
Reuben Morais
Lindsay Saunders
Francis M. Tyers
Gregor Weber
VLM
51
1,547
0
13 Dec 2019
PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
218
42,038
0
03 Dec 2019
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded
  Speech
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech
David Harwath
Wei-Ning Hsu
James R. Glass
39
84
0
21 Nov 2019
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source
  End-to-End Text-to-Speech Toolkit
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit
Tomoki Hayashi
Ryuichi Yamamoto
Katsuki Inoue
Takenori Yoshimura
Shinji Watanabe
Tomoki Toda
K. Takeda
Yu Zhang
Xu Tan
VLM
55
203
0
24 Oct 2019
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
Alexei Baevski
Steffen Schneider
Michael Auli
SSL
77
662
0
12 Oct 2019
Speech Recognition with Augmented Synthesized Speech
Speech Recognition with Augmented Synthesized Speech
Andrew Rosenberg
Yu Zhang
Bhuvana Ramabhadran
Ye Jia
Pedro J. Moreno
Yonghui Wu
Zelin Wu
50
127
0
25 Sep 2019
VizSeq: A Visual Analysis Toolkit for Text Generation Tasks
VizSeq: A Visual Analysis Toolkit for Text Generation Tasks
Changhan Wang
Anirudh Jain
Danlu Chen
Jiatao Gu
27
29
0
12 Sep 2019
Facebook FAIR's WMT19 News Translation Task Submission
Facebook FAIR's WMT19 News Translation Task Submission
Nathan Ng
Kyra Yee
Alexei Baevski
Myle Ott
Michael Auli
Sergey Edunov
VLM
55
394
0
15 Jul 2019
Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text
Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text
M. Baskar
Shinji Watanabe
Ramón Fernández Astudillo
Takaaki Hori
L. Burget
J. Černocký
55
40
0
30 Apr 2019
Direct speech-to-speech translation with a sequence-to-sequence model
Direct speech-to-speech translation with a sequence-to-sequence model
Ye Jia
Ron J. Weiss
Fadi Biadsy
Wolfgang Macherey
Melvin Johnson
Zhiwen Chen
Yonghui Wu
48
225
0
12 Apr 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
VLM
FaML
76
3,141
0
01 Apr 2019
Cycle-consistency training for end-to-end speech recognition
Cycle-consistency training for end-to-end speech recognition
Takaaki Hori
Ramón Fernández Astudillo
Tomoki Hayashi
Yu Zhang
Shinji Watanabe
Jonathan Le Roux
59
87
0
02 Nov 2018
WaveGlow: A Flow-based Generative Network for Speech Synthesis
WaveGlow: A Flow-based Generative Network for Speech Synthesis
R. Prenger
Rafael Valle
Bryan Catanzaro
120
1,024
0
31 Oct 2018
Hierarchical Generative Modeling for Controllable Speech Synthesis
Hierarchical Generative Modeling for Controllable Speech Synthesis
Wei-Ning Hsu
Yu Zhang
Ron J. Weiss
Heiga Zen
Yonghui Wu
...
Ye Jia
Zhiwen Chen
Jonathan Shen
Patrick Nguyen
Ruoming Pang
BDL
38
275
0
16 Oct 2018
Back-Translation-Style Data Augmentation for End-to-End ASR
Back-Translation-Style Data Augmentation for End-to-End ASR
Tomoki Hayashi
Shinji Watanabe
Yu Zhang
Tomoki Toda
Takaaki Hori
Ramón Fernández Astudillo
K. Takeda
52
103
0
28 Jul 2018
Transfer Learning from Speaker Verification to Multispeaker
  Text-To-Speech Synthesis
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Zhiwen Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
240
826
0
12 Jun 2018
Expressive Speech Synthesis via Modeling Expressions with Variational
  Autoencoder
Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder
K. Akuzawa
Yusuke Iwasawa
Y. Matsuo
25
139
0
06 Apr 2018
ESPnet: End-to-End Speech Processing Toolkit
ESPnet: End-to-End Speech Processing Toolkit
Shinji Watanabe
Takaaki Hori
Shigeki Karita
Tomoki Hayashi
Jiro Nishitoba
...
Jahn Heymann
Sanjeev Khudanpur
Nanxin Chen
Adithya Renduchintala
Tsubasa Ochiai
VLM
79
1,492
0
30 Mar 2018
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with
  Tacotron
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
RJ Skerry-Ryan
Eric Battenberg
Y. Xiao
Yuxuan Wang
Daisy Stanton
Joel Shor
Ron J. Weiss
R. Clark
Rif A. Saurous
40
550
0
24 Mar 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in
  End-to-End Speech Synthesis
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Y. Xiao
Fei Ren
Ye Jia
Rif A. Saurous
57
822
0
23 Mar 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
  Predictions
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
...
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
63
2,684
0
16 Dec 2017
Neural Discrete Representation Learning
Neural Discrete Representation Learning
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDL
SSL
OCL
149
4,928
0
02 Nov 2017
Mixed Precision Training
Mixed Precision Training
Paulius Micikevicius
Sharan Narang
Jonah Alben
G. Diamos
Erich Elsen
...
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
Hao Wu
122
1,779
0
10 Oct 2017
Listening while Speaking: Speech Chain by Deep Learning
Listening while Speaking: Speech Chain by Deep Learning
Andros Tjandra
S. Sakti
Satoshi Nakamura
AuLLM
147
165
0
16 Jul 2017
Deep Voice 2: Multi-Speaker Neural Text-to-Speech
Deep Voice 2: Multi-Speaker Neural Text-to-Speech
Sercan O. Arik
G. Diamos
Andrew Gibiansky
John Miller
Kainan Peng
Ming-Yu Liu
Jonathan Raiman
Yanqi Zhou
59
495
0
24 May 2017
Sequence-to-Sequence Models Can Directly Translate Foreign Speech
Sequence-to-Sequence Models Can Directly Translate Foreign Speech
Ron J. Weiss
J. Chorowski
Navdeep Jaitly
Yonghui Wu
Zhiwen Chen
71
341
0
24 Mar 2017
End-to-End Text-Dependent Speaker Verification
End-to-End Text-Dependent Speaker Verification
G. Heigold
Ignacio López Moreno
Samy Bengio
Noam M. Shazeer
43
585
0
27 Sep 2015
1