Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM

29 May 2025

Main:4 Pages

3 Figures

Bibliography:1 Pages

5 Tables

Abstract

Overlapping Speech Detection (OSD) aims to identify regions where multiple speakers overlap in a conversation, a critical challenge in multi-party speech processing. This work proposes a speaker-aware progressive OSD model that leverages a progressive training strategy to enhance the correlation between subtasks such as voice activity detection (VAD) and overlap detection. To improve acoustic representation, we explore the effectiveness of state-of-the-art self-supervised learning (SSL) models, including WavLM and wav2vec 2.0, while incorporating a speaker attention module to enrich features with frame-level speaker information. Experimental results show that the proposed method achieves state-of-the-art performance, with an F1 score of 82.76\% on the AMI test set, demonstrating its robustness and effectiveness in OSD.

View on arXiv

@article{sun2025_2505.23207,
  title={ Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM },
  author={ Zhaokai Sun and Li Zhang and Qing Wang and Pan Zhou and Lei Xie },
  journal={arXiv preprint arXiv:2505.23207},
  year={ 2025 }
}

Comments on this paper