No Audiogram: Leveraging Existing Scores for Personalized Speech Intelligibility Prediction

Personalized speech intelligibility prediction is challenging. Previous approaches have mainly relied on audiograms, which are inherently limited in accuracy as they only capture a listener's hearing threshold for pure tones. Rather than incorporating additional listener features, we propose a novel approach that leverages an individual's existing intelligibility data to predict their performance on new audio. We introduce the Support Sample-Based Intelligibility Prediction Network (SSIPNet), a deep learning model that leverages speech foundation models to build a high-dimensional representation of a listener's speech recognition ability from multiple support (audio, score) pairs, enabling accurate predictions for unseen audio. Results on the Clarity Prediction Challenge dataset show that, even with a small number of support (audio, score) pairs, our method outperforms audiogram-based predictions. Our work presents a new paradigm for personalized speech intelligibility prediction.
View on arXiv@article{zhou2025_2506.02039, title={ No Audiogram: Leveraging Existing Scores for Personalized Speech Intelligibility Prediction }, author={ Haoshuai Zhou and Changgeng Mo and Boxuan Cao and Linkai Li and Shan Xiang Wang }, journal={arXiv preprint arXiv:2506.02039}, year={ 2025 } }