SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms

16 June 2025

Main:4 Pages

4 Figures

Bibliography:1 Pages

3 Tables

Abstract

Speech pre-processing techniques such as denoising, de-reverberation, and separation, are commonly employed as front-ends for various downstream speech processing tasks. However, these methods can sometimes be inadequate, resulting in residual noise or the introduction of new artifacts. Such deficiencies are typically not captured by metrics like SI-SNR but are noticeable to human listeners. To address this, we introduce SpeechRefiner, a post-processing tool that utilizes Conditional Flow Matching (CFM) to improve the perceptual quality of speech. In this study, we benchmark SpeechRefiner against recent task-specific refinement methods and evaluate its performance within our internal processing pipeline, which integrates multiple front-end algorithms. Experiments show that SpeechRefiner exhibits strong generalization across diverse impairment sources, significantly enhancing speech perceptual quality. Audio demos can be found atthis https URL.

View on arXiv

@article{li2025_2506.13709,
  title={ SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms },
  author={ Sirui Li and Shuai Wang and Zhijun Liu and Zhongjie Jiang and Yannan Wang and Haizhou Li },
  journal={arXiv preprint arXiv:2506.13709},
  year={ 2025 }
}

Comments on this paper