ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.12706
88
0

Scalable Model Merging with Progressive Layer-wise Distillation

18 February 2025
Jing Xu
Jiazheng Li
J. Zhang
    MoMe
    FedML
ArXivPDFHTML
Abstract

Model merging offers an effective way to integrate the capabilities of multiple fine-tuned models. However, the performance degradation of the merged model remains a challenge, particularly when none or few data are available. This paper first highlights the necessity of domain-specific data for model merging by proving that data-agnostic algorithms can have arbitrarily bad worst-case performance. Building on this theoretical insight, we explore the relationship between model merging and distillation, introducing a novel few-shot merging algorithm, ProDistill (Progressive Layer-wise Distillation). Unlike common belief that layer wise training hurts performance, we show that layer-wise teacher-student distillation not only enhances the scalability but also improves model merging performance. We conduct extensive experiments to show that compared to existing few-shot merging methods, ProDistill achieves state-of-the-art performance, with up to 6.14% and 6.61% improvements in vision and NLU tasks. Furthermore, we extend the experiments to models with over 10B parameters, showcasing the exceptional scalability of ProDistill.

View on arXiv
@article{xu2025_2502.12706,
  title={ Scalable Model Merging with Progressive Layer-wise Distillation },
  author={ Jing Xu and Jiazheng Li and Jingzhao Zhang },
  journal={arXiv preprint arXiv:2502.12706},
  year={ 2025 }
}
Comments on this paper