From a multi-model compression perspective, model merging enables memory-efficient serving of multiple models fine-tuned from the same base, but suffers from degraded performance due to interference among their task-specific parameter adjustments (i.e., deltas). In this paper, we reformulate model merging as a compress-and-retrieve scheme, revealing that the task interference arises from the summation of irrelevant deltas during model retrieval. To address this issue, we use random orthogonal transformations to decorrelate these vectors into self-cancellation. We show that this approach drastically reduces interference, improving performance across both vision and language tasks. Since these transformations are fully defined by random seeds, adding new models requires no extra memory. Further, their data- and model-agnostic nature enables easy addition or removal of models with minimal compute overhead, supporting efficient and flexible multi-model serving.
View on arXiv@article{zhou2025_2505.11204, title={ RanDeS: Randomized Delta Superposition for Multi-Model Compression }, author={ Hangyu Zhou and Aaron Gokaslan and Volodymyr Kuleshov and Bharath Hariharan }, journal={arXiv preprint arXiv:2505.11204}, year={ 2025 } }