ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.11403
56
0

A correlation-permutation approach for speech-music encoders model merging

13 June 2025
Fabian Ritter-Gutierrez
Yi-Cheng Lin
Jeremy H.M Wong
Hung-yi Lee
Eng Siong Chng
Nancy F. Chen
    MoMe
ArXiv (abs)PDFHTML
Main:5 Pages
1 Figures
Bibliography:2 Pages
Abstract

Creating a unified speech and music model requires expensive pre-training. Model merging can instead create an unified audio model with minimal computational expense. However, direct merging is challenging when the models are not aligned in the weight space. Motivated by Git Re-Basin, we introduce a correlation-permutation approach that aligns a music encoder's internal layers with a speech encoder. We extend previous work to the case of merging transformer layers. The method computes a permutation matrix that maximizes the model's features-wise cross-correlations layer by layer, enabling effective fusion of these otherwise disjoint models. The merged model retains speech capabilities through this method while significantly enhancing music performance, achieving an improvement of 14.83 points in average score compared to linear interpolation model merging. This work allows the creation of unified audio models from independently trained encoders.

View on arXiv
@article{ritter-gutierrez2025_2506.11403,
  title={ A correlation-permutation approach for speech-music encoders model merging },
  author={ Fabian Ritter-Gutierrez and Yi-Cheng Lin and Jeremy H.M Wong and Hung-yi Lee and Eng Siong Chng and Nancy F. Chen },
  journal={arXiv preprint arXiv:2506.11403},
  year={ 2025 }
}
Comments on this paper