v1v2 (latest)

SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification

20 May 2025

Theo Lepage

Reda Dehak

ArXiv (abs)PDF HTML

Main:4 Pages

2 Figures

Bibliography:1 Pages

3 Tables

Abstract

Self-Supervised Learning (SSL) has led to considerable progress in Speaker Verification (SV). The standard framework uses same-utterance positive sampling and data-augmentation to generate anchor-positive pairs of the same speaker. This is a major limitation, as this strategy primarily encodes channel information from the recording condition, shared by the anchor and positive. We propose a new positive sampling technique to address this bottleneck: Self-Supervised Positive Sampling (SSPS). For a given anchor, SSPS aims to find an appropriate positive, i.e., of the same speaker identity but a different recording condition, in the latent space using clustering assignments and a memory queue of positive embeddings. SSPS improves SV performance for both SimCLR and DINO, reaching 2.57% and 2.53% EER, outperforming SOTA SSL methods on VoxCeleb1-O. In particular, SimCLR-SSPS achieves a 58% EER reduction by lowering intra-speaker variance, providing comparable performance to DINO-SSPS.

View on arXiv

@article{lepage2025_2505.14561,
  title={ SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification },
  author={ Theo Lepage and Reda Dehak },
  journal={arXiv preprint arXiv:2505.14561},
  year={ 2025 }
}

Comments on this paper