ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.11130
22
0
v1v2 (latest)

A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data

10 June 2025
Cheng-Kang Chou
Chan-Jan Hsu
Ho-Lam Chung
Liang-Hsuan Tseng
H. Cheng
Yu-Kuan Fu
Kuan Po Huang
Hung-yi Lee
ArXiv (abs)PDFHTML
Main:6 Pages
1 Figures
Bibliography:2 Pages
Abstract

We propose a self-refining framework that enhances ASR performance with only unlabeled datasets. The process starts with an existing ASR model generating pseudo-labels on unannotated speech, which are then used to train a high-fidelity text-to-speech (TTS) system. Then, synthesized speech text pairs are bootstrapped into the original ASR system, completing the closed-loop self-improvement cycle. We demonstrated the effectiveness of the framework on Taiwanese Mandarin speech. Leveraging 6,000 hours of unlabeled speech, a moderate amount of text data, and synthetic content from the AI models, we adapt Whisper-large-v2 into a specialized model, Twister. Twister reduces error rates by up to 20% on Mandarin and 50% on Mandarin-English code-switching benchmarks compared to Whisper. Results highlight the framework as a compelling alternative to pseudo-labeling self-distillation approaches and provides a practical pathway for improving ASR performance in low-resource or domain-specific settings.

View on arXiv
@article{chou2025_2506.11130,
  title={ A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data },
  author={ Cheng-Kang Chou and Chan-Jan Hsu and Ho-Lam Chung and Liang-Hsuan Tseng and Hsi-Chun Cheng and Yu-Kuan Fu and Kuan Po Huang and Hung-Yi Lee },
  journal={arXiv preprint arXiv:2506.11130},
  year={ 2025 }
}
Comments on this paper