Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation

Large Language Models (LLMs), trained on a large amount of corpus, have demonstrated remarkable abilities. However, it may not be sufficient to directly apply open-source LLMs like Llama to certain real-world scenarios, since most of them are trained for \emph{general} purposes. Thus, the demands for customizing publicly available LLMs emerge, but are currently under-studied. In this work, we consider customizing pre-trained LLMs with new human preferences. Specifically, the LLM should not only meet the new preference but also preserve its original capabilities after customization. Drawing inspiration from the observation that human preference can be expressed as a reward model, we propose to cast LLM customization as optimizing the sum of two reward functions, one of which (denoted as ) was used to pre-train the LLM while the other (denoted as ) characterizes the new human preference. The obstacle here is that both reward functions are unknown, making the application of modern reinforcement learning methods infeasible. Thanks to the residual Q-learning framework, we can restore the customized LLM with the pre-trained LLM and the \emph{residual Q-function} without the reward function . Moreover, we find that for a fixed pre-trained LLM, the reward function can be derived from the residual Q-function, enabling us to directly learn the residual Q-function from the new human preference data upon the Bradley-Terry model. We name our method Q-Adapter as it introduces an adapter module to approximate the residual Q-function for customizing the pre-trained LLM towards the new preference. Experiments based on the Llama-3.1 model on the DSP dataset and HH-RLHF dataset illustrate the superior effectiveness of Q-Adapter on both retaining existing knowledge and learning new preferences. Code is available atthis https URL.
View on arXiv@article{li2025_2407.03856, title={ Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation }, author={ Yi-Chen Li and Fuxiang Zhang and Wenjie Qiu and Lei Yuan and Chengxing Jia and Zongzhang Zhang and Yang Yu and Bo An }, journal={arXiv preprint arXiv:2407.03856}, year={ 2025 } }