Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching

19 June 2025

Main:4 Pages

1 Figures

Bibliography:1 Pages

2 Tables

Abstract

Dysarthria is a neurological disorder that significantly impairs speech intelligibility, often rendering affected individuals unable to communicate effectively. This necessitates the development of robust dysarthric-to-regular speech conversion techniques. In this work, we investigate the utility and limitations of self-supervised learning (SSL) features and their quantized representations as an alternative to mel-spectrograms for speech generation. Additionally, we explore methods to mitigate speaker variability by generating clean speech in a single-speaker voice using features extracted from WavLM. To this end, we propose a fully non-autoregressive approach that leverages Conditional Flow Matching (CFM) with Diffusion Transformers to learn a direct mapping from dysarthric to clean speech. Our findings highlight the effectiveness of discrete acoustic units in improving intelligibility while achieving faster convergence compared to traditional mel-spectrogram-based approaches.

View on arXiv

@article{das2025_2506.16127,
  title={ Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching },
  author={ Shoutrik Das and Nishant Singh and Arjun Gangwar and S Umesh },
  journal={arXiv preprint arXiv:2506.16127},
  year={ 2025 }
}

Comments on this paper