ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.21023
80
0

ParamΔΔΔ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost

23 April 2025
Sheng Cao
Mingrui Wu
Karthik Prasad
Yuandong Tian
Zechun Liu
    MoMe
ArXivPDFHTML
Abstract

The post-training phase of large language models is essential for enhancing capabilities such as instruction-following, reasoning, and alignment with human preferences. However, it demands extensive high-quality data and poses risks like overfitting, alongside significant computational costs due to repeated post-training and evaluation after each base model update. This paper introduces ParamΔParam\DeltaParamΔ, a novel method that streamlines post-training by transferring knowledge from an existing post-trained model to a newly updated base model with ZERO additional training. By computing the difference between post-trained model weights (Θpost\Theta_\text{post}Θpost​) and base model weights (Θbase\Theta_\text{base}Θbase​), and adding this to the updated base model (Θbase′\Theta'_\text{base}Θbase′​), we define ParamΔParam\DeltaParamΔ Model as: ΘParamΔ=Θpost−Θbase+Θbase′\Theta_{\text{Param}\Delta} = \Theta_\text{post} - \Theta_\text{base} + \Theta'_\text{base}ΘParamΔ​=Θpost​−Θbase​+Θbase′​. This approach surprisingly equips the new base model with post-trained capabilities, achieving performance comparable to direct post-training. We did analysis on LLama3, Llama3.1, Qwen, and DeepSeek-distilled models. Results indicate ParamΔParam\DeltaParamΔ Model effectively replicates traditional post-training. For example, the ParamΔParam\DeltaParamΔ Model obtained from 70B Llama3-inst, Llama3-base, Llama3.1-base models attains approximately 95\% of Llama3.1-inst model's performance on average. ParamΔParam\DeltaParamΔ brings a new perspective on how to fully leverage models in the open-weight community, where checkpoints for base and instruct models are readily available and frequently updated, by providing a cost-free framework to accelerate the iterative cycle of model development.

View on arXiv
@article{cao2025_2504.21023,
  title={ Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost },
  author={ Sheng Cao and Mingrui Wu and Karthik Prasad and Yuandong Tian and Zechun Liu },
  journal={arXiv preprint arXiv:2504.21023},
  year={ 2025 }
}
Comments on this paper