Data-Driven Mispronunciation Pattern Discovery for Robust Speech Recognition

Recent advancements in machine learning have significantly improved speech recognition, but recognizing speech from non-fluent or accented speakers remains a challenge. Previous efforts, relying on rule-based pronunciation patterns, have struggled to fully capture non-native errors. We propose two data-driven approaches using speech corpora to automatically detect mispronunciation patterns. By aligning non-native phones with their native counterparts using attention maps, we achieved a 5.7% improvement in speech recognition on native English datasets and a 12.8% improvement for non-native English speakers, particularly Korean speakers. Our method offers practical advancements for robust Automatic Speech Recognition (ASR) systems particularly for situations where prior linguistic knowledge is not applicable.
View on arXiv@article{choi2025_2502.00583, title={ Data-Driven Mispronunciation Pattern Discovery for Robust Speech Recognition }, author={ Anna Seo Gyeong Choi and Jonghyeon Park and Myungwoo Oh }, journal={arXiv preprint arXiv:2502.00583}, year={ 2025 } }