ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.13330
41
0
v1v2v3 (latest)

Unsupervised ASR via Cross-Lingual Pseudo-Labeling

19 May 2023
Tatiana Likhomanenko
Loren Lugosch
R. Collobert
ArXiv (abs)PDFHTML
Abstract

Recent work has shown that it is possible to train an unsupervised\textit{unsupervised}unsupervised automatic speech recognition (ASR) system using only unpaired audio and text. Existing unsupervised ASR methods assume that no labeled data can be used for training. We argue that even if one does not have any labeled audio for a given language, there is always\textit{always}always labeled data available for other languages. We show that it is possible to use character-level acoustic models (AMs) from other languages to bootstrap an unsupervised\textit{unsupervised}unsupervised AM in a new language. Here, "unsupervised" means no labeled audio is available for the target\textit{target}target language. Our approach is based on two key ingredients: (i) generating pseudo-labels (PLs) of the target\textit{target}target language using some other\textit{other}other language AM and (ii) constraining these PLs with a target language model\textit{target language model}target language model. Our approach is effective on Common Voice: e.g. transfer of English AM to Swahili achieves 18% WER. It also outperforms character-based wav2vec-U 2.0 by 15% absolute WER on LJSpeech with 800h of labeled German data instead of 60k hours of unlabeled English data.

View on arXiv
Comments on this paper