Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models

26 May 2025

Main:8 Pages

3 Figures

Bibliography:4 Pages

9 Tables

Appendix:6 Pages

Abstract

The widespread adoption of large language models (LLMs) across industries has increased the demand for high-quality and customizable outputs. However, traditional alignment methods often require retraining large pretrained models, making it difficult to quickly adapt and optimize LLMs for diverse applications. To address this limitation, we propose a novel \textit{Residual Alignment Model} (\textit{RAM}) that formalizes the alignment process as a type of importance sampling. In this framework, the unaligned upstream model serves as the proposal distribution, while the alignment process is framed as secondary sampling based on an autoregressive alignment module that acts as an estimator of the importance weights. This design enables a natural detachment of the alignment module from the target aligned model, improving flexibility and scalability. Based on this model, we derive an efficient sequence-level training strategy for the alignment module, which operates independently of the proposal module. Additionally, we develop a resampling algorithm with iterative token-level decoding to address the common first-token latency issue in comparable methods. Experimental evaluations on two leading open-source LLMs across diverse tasks, including instruction following, domain adaptation, and preference optimization, demonstrate that our approach consistently outperforms baseline models.

View on arXiv

@article{liu2025_2505.19700,
  title={ Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models },
  author={ Yi Liu and Dianqing Liu and Mingye Zhu and Junbo Guo and Yongdong Zhang and Zhendong Mao },
  journal={arXiv preprint arXiv:2505.19700},
  year={ 2025 }
}

Comments on this paper