Learning Disentangled Speech Representations

4 November 2023

Yusuf Brima

ArXiv (abs)PDF HTML Github

Main:10 Pages

8 Figures

Bibliography:2 Pages

6 Tables

Appendix:3 Pages

Abstract

Disentangled representation learning from speech remains limited despite its importance in many application domains. A key challenge is the lack of speech datasets with known generative factors to evaluate methods. This paper proposes SynSpeech: a novel synthetic speech dataset with ground truth factors enabling research on disentangling speech representations. We plan to present a comprehensive study evaluating supervised techniques using established supervised disentanglement metrics. This benchmark dataset and framework address the gap in the rigorous evaluation of state-of-the-art disentangled speech representation learning methods. Our findings will provide insights to advance this underexplored area and enable more robust speech representations.

View on arXiv

Comments on this paper