This study explores the potential of Rhythm Formant Analysis (RFA) to capture long-term temporal modulations in dementia speech. Specifically, we introduce RFA-derived rhythm spectrograms as novel features for dementia classification and regression tasks. We propose two methodologies: (1) handcrafted features derived from rhythm spectrograms, and (2) a data-driven fusion approach, integrating proposed RFA-derived rhythm spectrograms with vision transformer (ViT) for acoustic representations along with BERT-based linguistic embeddings. We compare these with existing features. Notably, our handcrafted features outperform eGeMAPs with a relative improvement of in classification accuracy and comparable performance in the regression task. The fusion approach also shows improvement, with RFA spectrograms surpassing Mel spectrograms in classification by around a relative improvement of and a comparable regression score with the baselines.
View on arXiv@article{gogoi2025_2506.00861, title={ Leveraging AM and FM Rhythm Spectrograms for Dementia Classification and Assessment }, author={ Parismita Gogoi and Vishwanath Pratap Singh and Seema Khadirnaikar and Soma Siddhartha and Sishir Kalita and Jagabandhu Mishra and Md Sahidullah and Priyankoo Sarmah and S. R. M. Prasanna }, journal={arXiv preprint arXiv:2506.00861}, year={ 2025 } }