ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.03856
63
3
v1v2v3v4 (latest)

Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation

4 July 2024
Yi-Chen Li
Fuxiang Zhang
Wenjie Qiu
Lei Yuan
Chengxing Jia
Zongzhang Zhang
Yang Yu
Bo An
ArXiv (abs)PDFHTML
Abstract

Large Language Models (LLMs), trained on a large amount of corpus, have demonstrated remarkable abilities. However, it may not be sufficient to directly apply open-source LLMs like Llama to certain real-world scenarios, since most of them are trained for \emph{general} purposes. Thus, the demands for customizing publicly available LLMs emerge, but are currently under-studied. In this work, we consider customizing pre-trained LLMs with new human preferences. Specifically, the LLM should not only meet the new preference but also preserve its original capabilities after customization. Drawing inspiration from the observation that human preference can be expressed as a reward model, we propose to cast LLM customization as optimizing the sum of two reward functions, one of which (denoted as r1r_1r1​) was used to pre-train the LLM while the other (denoted as r2r_2r2​) characterizes the new human preference. The obstacle here is that both reward functions are unknown, making the application of modern reinforcement learning methods infeasible. Thanks to the residual Q-learning framework, we can restore the customized LLM with the pre-trained LLM and the \emph{residual Q-function} without the reward function r1r_1r1​. Moreover, we find that for a fixed pre-trained LLM, the reward function r2r_2r2​ can be derived from the residual Q-function, enabling us to directly learn the residual Q-function from the new human preference data upon the Bradley-Terry model. We name our method Q-Adapter as it introduces an adapter module to approximate the residual Q-function for customizing the pre-trained LLM towards the new preference. Experiments based on the Llama-3.1 model on the DSP dataset and HH-RLHF dataset illustrate the superior effectiveness of Q-Adapter on both retaining existing knowledge and learning new preferences. Code is available at \url{https://github.com/mansicer/Q-Adapter}.

View on arXiv
@article{li2025_2407.03856,
  title={ Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation },
  author={ Yi-Chen Li and Fuxiang Zhang and Wenjie Qiu and Lei Yuan and Chengxing Jia and Zongzhang Zhang and Yang Yu and Bo An },
  journal={arXiv preprint arXiv:2407.03856},
  year={ 2025 }
}
Comments on this paper