Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.04659
Cited By
Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
11 July 2022
Naoki Makishima
Satoshi Suzuki
Atsushi Ando
Ryo Masumura
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data"
13 / 13 papers shown
Title
Benchmarking Expressive Japanese Character Text-to-Speech with VITS and Style-BERT-VITS2
Zackary Rackauckas
Julia Hirschberg
40
0
0
22 May 2025
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia
Heiga Zen
Jonathan Shen
Yu Zhang
Yonghui Wu
SSL
96
84
0
28 Mar 2021
Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech
C. Chien
Jheng-hao Lin
Chien-yu Huang
Po-Chun Hsu
Hung-yi Lee
105
70
0
06 Mar 2021
TTS-by-TTS: TTS-driven Data Augmentation for Fast and High-Quality Speech Synthesis
Min-Jae Hwang
Ryuichi Yamamoto
Eunwoo Song
Jae-Min Kim
44
32
0
26 Oct 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
105
1,411
0
08 Jun 2020
From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint
Zexin Cai
Chuxiong Zhang
Ming Li
64
42
0
10 May 2020
Speech Recognition with Augmented Synthesized Speech
Andrew Rosenberg
Yu Zhang
Bhuvana Ramabhadran
Ye Jia
Pedro J. Moreno
Yonghui Wu
Zelin Wu
67
128
0
25 Sep 2019
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
104
959
0
05 Apr 2019
Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis
Yu-An Chung
Yuxuan Wang
Wei-Ning Hsu
Yu Zhang
RJ Skerry-Ryan
84
117
0
30 Aug 2018
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
356
2,287
0
14 Jun 2018
Attentive Statistics Pooling for Deep Speaker Embedding
K. Okabe
Takafumi Koshinaka
Koichi Shinoda
113
531
0
29 Mar 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
...
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
85
2,704
0
16 Dec 2017
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
Samy Bengio
Oriol Vinyals
Navdeep Jaitly
Noam M. Shazeer
154
2,039
0
09 Jun 2015
1