You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation

14 May 2020

Ivan Medennikov

Papers citing "You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation"

25 / 25 papers shown

Title
Contrastive Learning from Synthetic Audio Doppelgängers Manuel Cherep Nikhil Singh 51 1 0 09 Jun 2024
Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription A. Andrusenko A. Laptev Ivan Medennikov 27 16 0 22 Apr 2020
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior Guangzhi Sun Yu Zhang Ron J. Weiss Yuan Cao Heiga Zen Andrew Rosenberg Bhuvana Ramabhadran Yonghui Wu DiffM 46 92 0 06 Feb 2020
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems Nick Rossenbach Albert Zeyer Ralf Schluter Hermann Ney 36 83 0 19 Dec 2019
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures Gabriel Synnaeve Qiantong Xu Jacob Kahn Tatiana Likhomanenko Edouard Grave Vineel Pratap Anuroop Sriram Vitaliy Liptchinsky R. Collobert SSL AI4TS 86 245 0 19 Nov 2019
Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis Eric Battenberg RJ Skerry-Ryan Soroosh Mariooryad Daisy Stanton David Kao Matt Shannon Tom Bagby 48 114 0 23 Oct 2019
Speech Recognition with Augmented Synthesized Speech Andrew Rosenberg Yu Zhang Bhuvana Ramabhadran Ye Jia Pedro J. Moreno Yonghui Wu Zelin Wu 40 127 0 25 Sep 2019
A Comparative Study on Transformer vs RNN in Speech Applications Shigeki Karita Nanxin Chen Tomoki Hayashi Takaaki Hori Hirofumi Inaguma ... Ryuichi Yamamoto Xiao-fei Wang Shinji Watanabe Takenori Yoshimura Wangyou Zhang 44 718 0 13 Sep 2019
Maximizing Mutual Information for Tacotron Peng Liu Xixin Wu Shiyin Kang Guangzhi Li Dan Su Dong Yu 44 16 0 30 Aug 2019
Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis Eric Battenberg Soroosh Mariooryad Daisy Stanton RJ Skerry-Ryan Matt Shannon David Kao Tom Bagby BDL 32 45 0 08 Jun 2019
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation Christoph Luscher Eugen Beck Kazuki Irie M. Kitza Wilfried Michel Albert Zeyer Ralf Schluter Hermann Ney VLM 83 234 0 08 May 2019
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition Daniel S. Park William Chan Yu Zhang Chung-Cheng Chiu Barret Zoph E. D. Cubuk Quoc V. Le VLM 140 3,435 0 18 Apr 2019
Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models Thomas Drugman Janne Pylkkönen Reinhard Kneser 10 59 0 07 Mar 2019
Training Neural Speech Recognition Systems with Synthetic Speech Augmentation Jason Chun Lok Li R. Gadde Boris Ginsburg Vitaly Lavrukhin 26 55 0 02 Nov 2018
End-to-End Feedback Loss in Speech Chain Framework via Straight-Through Estimator Andros Tjandra S. Sakti Satoshi Nakamura 31 44 0 31 Oct 2018
LPCNet: Improving Neural Speech Synthesis Through Linear Prediction J. Valin Jan Skoglund 28 450 0 28 Oct 2018
Hierarchical Generative Modeling for Controllable Speech Synthesis Wei-Ning Hsu Yu Zhang Ron J. Weiss Heiga Zen Yonghui Wu ... Ye Jia Zhiwen Chen Jonathan Shen Patrick Nguyen Ruoming Pang BDL 36 275 0 16 Oct 2018
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing Taku Kudo John Richardson 131 3,490 0 19 Aug 2018
ESPnet: End-to-End Speech Processing Toolkit Shinji Watanabe Takaaki Hori Shigeki Karita Tomoki Hayashi Jiro Nishitoba ... Jahn Heymann Sanjeev Khudanpur Nanxin Chen Adithya Renduchintala Tsubasa Ochiai VLM 70 1,492 0 30 Mar 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis Yuxuan Wang Daisy Stanton Yu Zhang RJ Skerry-Ryan Eric Battenberg Joel Shor Y. Xiao Fei Ren Ye Jia Rif A. Saurous 54 822 0 23 Mar 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions Jonathan Shen Ruoming Pang Ron J. Weiss M. Schuster Navdeep Jaitly ... Yuxuan Wang RJ Skerry-Ryan Rif A. Saurous Yannis Agiomyrgiannakis Yonghui Wu 59 2,684 0 16 Dec 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 324 129,831 0 12 Jun 2017
Tacotron: Towards End-to-End Speech Synthesis Yuxuan Wang RJ Skerry-Ryan Daisy Stanton Yonghui Wu Ron J. Weiss ... Samy Bengio Quoc V. Le Yannis Agiomyrgiannakis R. Clark Rif A. Saurous 120 1,817 0 29 Mar 2017
Attention-Based Models for Speech Recognition J. Chorowski Dzmitry Bahdanau Dmitriy Serdyuk Kyunghyun Cho Yoshua Bengio 88 2,602 0 24 Jun 2015
Auto-Encoding Variational Bayes Diederik P. Kingma Max Welling BDL 332 16,972 0 20 Dec 2013