Semi-supervised Speech Enhancement in Modulation Subspace

29 September 2016

Abstract

Previous studies show that existing speech enhancement algorithms can improve speech quality but not speech intelligibility. In this study, we propose a modulation subspace (MS) based speech enhancement framework, in which the spectrogram of noisy speech is decoupled as the product of a spectral envelop subspace and a spectral details subspace. This decoupling approach provides a method to specifically work on elimination of those noises that greatly affect the intelligibility. Two supervised low-rank and sparse decomposition schemes are developed in the spectral envelop subspace to obtain a robust recovery of speech components. A Bayesian formulation of non-negative factorization (NMF) is used to learn the speech dictionary from the spectral envelop subspace of clean speech samples. In the spectral details subspace, a standard robust principle component analysis (RPCA) is implemented to extract the speech components. The validation results show that compared with four state-of-the-art speech enhancement algorithms, including MMSE-SPP, NMF-RPCA, RPCA, and LARC, both proposed MS based algorithms achieve higher perceptual quality, and also demonstrate superiority on improving speech intelligibility.

View on arXiv

Comments on this paper