CoLMbo: Speaker Language Model for Descriptive Profiling

11 June 2025

Main:6 Pages

6 Figures

Bibliography:1 Pages

3 Tables

Abstract

Speaker recognition systems are often limited to classification tasks and struggle to generate detailed speaker characteristics or provide context-rich descriptions. These models primarily extract embeddings for speaker identification but fail to capture demographic attributes such as dialect, gender, and age in a structured manner. This paper introduces CoLMbo, a Speaker Language Model (SLM) that addresses these limitations by integrating a speaker encoder with prompt-based conditioning. This allows for the creation of detailed captions based on speaker embeddings. CoLMbo utilizes user-defined prompts to adapt dynamically to new speaker characteristics and provides customized descriptions, including regional dialect variations and age-related traits. This innovative approach not only enhances traditional speaker profiling but also excels in zero-shot scenarios across diverse datasets, marking a significant advancement in the field of speaker recognition.

View on arXiv

@article{baali2025_2506.09375,
  title={ CoLMbo: Speaker Language Model for Descriptive Profiling },
  author={ Massa Baali and Shuo Han and Syed Abdul Hannan and Purusottam Samal and Karanveer Singh and Soham Deshmukh and Rita Singh and Bhiksha Raj },
  journal={arXiv preprint arXiv:2506.09375},
  year={ 2025 }
}

Comments on this paper