Acoustic scene recordings are often collected from a diverse range of cities. Most existing acoustic scene classification (ASC) approaches focus on identifying common acoustic scene patterns across cities to enhance generalization. In contrast, we hypothesize that city-specific environmental and cultural differences in acoustic features are beneficial for the ASC task. In this paper, we introduce City2Scene, a novel framework that leverages city features to improve ASC. City2Scene transfers the city-specific knowledge from city classification models to a scene classification model using knowledge distillation. We evaluated City2Scene on the DCASE Challenge Task 1 datasets, where each audio clip is annotated with both scene and city labels. Experimental results demonstrate that city features provide valuable information for classifying scenes. By distilling the city-specific knowledge, City2Scene effectively improves accuracy for various state-of-the-art ASC backbone models, including both CNNs and Transformers.
View on arXiv@article{cai2025_2503.16862, title={ City2Scene: Improving Acoustic Scene Classification with City Features }, author={ Yiqiang Cai and Yizhou Tan and Peihong Zhang and Yuxuan Liu and Shengchen Li and Xi Shao and Mark D. Plumbley }, journal={arXiv preprint arXiv:2503.16862}, year={ 2025 } }