ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.16441
34
0

SoCov: Semi-Orthogonal Parametric Pooling of Covariance Matrix for Speaker Recognition

23 April 2025
Rongjin Li
Weibin Zhang
Dongpeng Chen
Jintao Kang
Xiaofen Xing
ArXivPDFHTML
Abstract

In conventional deep speaker embedding frameworks, the pooling layer aggregates all frame-level features over time and computes their mean and standard deviation statistics as inputs to subsequent segment-level layers. Such statistics pooling strategy produces fixed-length representations from variable-length speech segments. However, this method treats different frame-level features equally and discards covariance information. In this paper, we propose the Semi-orthogonal parameter pooling of Covariance matrix (SoCov) method. The SoCov pooling computes the covariance matrix from the self-attentive frame-level features and compresses it into a vector using the semi-orthogonal parametric vectorization, which is then concatenated with the weighted standard deviation vector to form inputs to the segment-level layers. Deep embedding based on SoCov is called ``sc-vector''. The proposed sc-vector is compared to several different baselines on the SRE21 development and evaluation sets. The sc-vector system significantly outperforms the conventional x-vector system, with a relative reduction in EER of 15.5% on SRE21Eval. When using self-attentive deep feature, SoCov helps to reduce EER on SRE21Eval by about 30.9% relatively to the conventional ``mean + standard deviation'' statistics.

View on arXiv
@article{li2025_2504.16441,
  title={ SoCov: Semi-Orthogonal Parametric Pooling of Covariance Matrix for Speaker Recognition },
  author={ Rongjin Li and Weibin Zhang and Dongpeng Chen and Jintao Kang and Xiaofen Xing },
  journal={arXiv preprint arXiv:2504.16441},
  year={ 2025 }
}
Comments on this paper